Overview

Dataset statistics

Number of variables57
Number of observations152552
Missing cells2331880
Missing cells (%)26.8%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory66.3 MiB
Average record size in memory456.0 B

Variable types

Categorical30
Numeric8
Unsupported14
DateTime4
Boolean1

Warnings

LAST_REFILL_SETTING has constant value "Facility" Constant
Facility Name has a high cardinality: 101 distinct values High cardinality
STATE has a high cardinality: 55 distinct values High cardinality
lga has a high cardinality: 579 distinct values High cardinality
REGIMEN has a high cardinality: 63 distinct values High cardinality
CAUSE_DEATH has a high cardinality: 103 distinct values High cardinality
WARD has a high cardinality: 1103 distinct values High cardinality
RECENCY_TESTING is highly correlated with RECENCY_CONSENTHigh correlation
State is highly correlated with entry_point and 10 other fieldsHigh correlation
BIOMETRIC is highly correlated with VIRAL_LOAD_TYPE and 2 other fieldsHigh correlation
entry_point is highly correlated with State and 8 other fieldsHigh correlation
OUTCOME is highly correlated with State and 6 other fieldsHigh correlation
TIME_HIV_DIAGNOSIS is highly correlated with State and 10 other fieldsHigh correlation
PREGNANT is highly correlated with TIME_HIV_DIAGNOSISHigh correlation
education is highly correlated with State and 4 other fieldsHigh correlation
PATIENT_ID is highly correlated with State and 6 other fieldsHigh correlation
STATUS_REGISTRATION is highly correlated with entry_point and 4 other fieldsHigh correlation
ENROLLMENT_SETTING is highly correlated with State and 3 other fieldsHigh correlation
FACILITY_ID is highly correlated with State and 8 other fieldsHigh correlation
L.G.A is highly correlated with State and 14 other fieldsHigh correlation
STATE is highly correlated with State and 10 other fieldsHigh correlation
GENDER is highly correlated with SOURCE_REFERRALHigh correlation
BREASTFEEDING is highly correlated with TIME_HIV_DIAGNOSISHigh correlation
REGIMEN is highly correlated with STATUS_REGISTRATION and 5 other fieldsHigh correlation
VIRAL_LOAD_TYPE is highly correlated with BIOMETRIC and 1 other fieldsHigh correlation
CBO_ID is highly correlated with ENROLLMENT_SETTINGHigh correlation
age_unit is highly correlated with REGIMENHigh correlation
SOURCE_REFERRAL is highly correlated with State and 6 other fieldsHigh correlation
tb_status is highly correlated with education and 1 other fieldsHigh correlation
REGIMENTYPE is highly correlated with REGIMENHigh correlation
CURRENT_STATUS is highly correlated with BIOMETRIC and 8 other fieldsHigh correlation
AGREED_DATE is highly correlated with BIOMETRIC and 7 other fieldsHigh correlation
RECENCY_CONSENT is highly correlated with RECENCY_TESTING and 5 other fieldsHigh correlation
OCCUPATION is highly correlated with L.G.A and 1 other fieldsHigh correlation
UNIQUE_ID has 58108 (38.1%) missing values Missing
marital_status has 4682 (3.1%) missing values Missing
education has 9007 (5.9%) missing values Missing
OCCUPATION has 6239 (4.1%) missing values Missing
STATE has 2937 (1.9%) missing values Missing
lga has 2543 (1.7%) missing values Missing
entry_point has 47050 (30.8%) missing values Missing
DATE_CONFIRMED_HIV has 51946 (34.1%) missing values Missing
DATE_ENROLLED_PMTCT has 151488 (99.3%) missing values Missing
SOURCE_REFERRAL has 152177 (99.8%) missing values Missing
TIME_HIV_DIAGNOSIS has 152100 (99.7%) missing values Missing
tb_status has 51879 (34.0%) missing values Missing
ENROLLMENT_SETTING has 35250 (23.1%) missing values Missing
CBO_ID has 14321 (9.4%) missing values Missing
DATE_STARTED has 23198 (15.2%) missing values Missing
enrolled_ovc has 11229 (7.4%) missing values Missing
RECENCY_CONSENT has 144524 (94.7%) missing values Missing
RECENCY_TESTING has 14771 (9.7%) missing values Missing
REGIMENTYPE has 22804 (14.9%) missing values Missing
REGIMEN has 22799 (14.9%) missing values Missing
LAST_CLINIC_STAGE has 23531 (15.4%) missing values Missing
DATE_LAST_CD4 has 110846 (72.7%) missing values Missing
DATE_LAST_VIRAL_LOAD has 75477 (49.5%) missing values Missing
VIRAL_LOAD_DUE_DATE has 42522 (27.9%) missing values Missing
VIRAL_LOAD_TYPE has 42522 (27.9%) missing values Missing
DATE_LAST_REFILL has 22797 (14.9%) missing values Missing
DATE_NEXT_REFILL has 22797 (14.9%) missing values Missing
LAST_REFILL_SETTING has 127526 (83.6%) missing values Missing
DATE_LAST_CLINIC has 17811 (11.7%) missing values Missing
DATE_NEXT_CLINIC has 21263 (13.9%) missing values Missing
DATE_TRACKED has 150090 (98.4%) missing values Missing
OUTCOME has 133094 (87.2%) missing values Missing
CAUSE_DEATH has 150543 (98.7%) missing values Missing
AGREED_DATE has 152534 (> 99.9%) missing values Missing
BIOMETRIC has 11229 (7.4%) missing values Missing
PARTNERINFORMATION_ID has 152552 (100.0%) missing values Missing
WARD has 94466 (61.9%) missing values Missing
CBO_ID is highly skewed (γ1 = 33.81683773) Skewed
LAST_VIRAL_LOAD is highly skewed (γ1 = 39.67085197) Skewed
LAST_CD4 is highly skewed (γ1 = 280.0604144) Skewed
LAST_CD4P is highly skewed (γ1 = 81.37770847) Skewed
PATIENT_ID is uniformly distributed Uniform
PATIENT_ID has unique values Unique
HOSPITAL_NUM is an unsupported type, check if it needs cleaning or further analysis Unsupported
UNIQUE_ID is an unsupported type, check if it needs cleaning or further analysis Unsupported
DATE_BIRTH is an unsupported type, check if it needs cleaning or further analysis Unsupported
DATE_CONFIRMED_HIV is an unsupported type, check if it needs cleaning or further analysis Unsupported
DATE_REGISTRATION is an unsupported type, check if it needs cleaning or further analysis Unsupported
DATE_STARTED is an unsupported type, check if it needs cleaning or further analysis Unsupported
DATE_CURRENT_STATUS is an unsupported type, check if it needs cleaning or further analysis Unsupported
DATE_LAST_CD4 is an unsupported type, check if it needs cleaning or further analysis Unsupported
VIRAL_LOAD_DUE_DATE is an unsupported type, check if it needs cleaning or further analysis Unsupported
DATE_LAST_REFILL is an unsupported type, check if it needs cleaning or further analysis Unsupported
DATE_NEXT_REFILL is an unsupported type, check if it needs cleaning or further analysis Unsupported
DATE_LAST_CLINIC is an unsupported type, check if it needs cleaning or further analysis Unsupported
DATE_NEXT_CLINIC is an unsupported type, check if it needs cleaning or further analysis Unsupported
PARTNERINFORMATION_ID is an unsupported type, check if it needs cleaning or further analysis Unsupported
CBO_ID has 138071 (90.5%) zeros Zeros
LAST_VIRAL_LOAD has 101214 (66.3%) zeros Zeros
LAST_CD4 has 113980 (74.7%) zeros Zeros
LAST_CD4P has 151427 (99.3%) zeros Zeros
LAST_REFILL_DURATION has 30765 (20.2%) zeros Zeros

Reproduction

Analysis started2021-06-15 08:48:50.998973
Analysis finished2021-06-15 08:50:00.045376
Duration1 minute and 9.05 seconds
Software versionpandas-profiling v3.0.0
Download configurationconfig.json

Variables

State
Categorical

HIGH CORRELATION

Distinct4
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size1.2 MiB
Akwa Ibom
83592 
Adamawa
25765 
Cross River
21654 
Niger
21541 

Length

Max length11
Median length9
Mean length8.381286381
Min length5

Characters and Unicode

Total characters1278582
Distinct characters19
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowNiger
2nd rowNiger
3rd rowNiger
4th rowNiger
5th rowNiger

Common Values

ValueCountFrequency (%)
Akwa Ibom83592
54.8%
Adamawa25765
 
16.9%
Cross River21654
 
14.2%
Niger21541
 
14.1%

Length

2021-06-15T08:50:00.260437image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-06-15T08:50:00.343997image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
ValueCountFrequency (%)
akwa83592
32.4%
ibom83592
32.4%
adamawa25765
 
10.0%
river21654
 
8.4%
cross21654
 
8.4%
niger21541
 
8.4%

Most occurring characters

ValueCountFrequency (%)
a160887
12.6%
A109357
8.6%
w109357
8.6%
m109357
8.6%
o105246
 
8.2%
105246
 
8.2%
k83592
 
6.5%
I83592
 
6.5%
b83592
 
6.5%
r64849
 
5.1%
Other values (9)263507
20.6%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter915538
71.6%
Uppercase Letter257798
 
20.2%
Space Separator105246
 
8.2%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
a160887
17.6%
w109357
11.9%
m109357
11.9%
o105246
11.5%
k83592
9.1%
b83592
9.1%
r64849
7.1%
s43308
 
4.7%
i43195
 
4.7%
e43195
 
4.7%
Other values (3)68960
7.5%
Uppercase Letter
ValueCountFrequency (%)
A109357
42.4%
I83592
32.4%
C21654
 
8.4%
R21654
 
8.4%
N21541
 
8.4%
Space Separator
ValueCountFrequency (%)
105246
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin1173336
91.8%
Common105246
 
8.2%

Most frequent character per script

Latin
ValueCountFrequency (%)
a160887
13.7%
A109357
9.3%
w109357
9.3%
m109357
9.3%
o105246
9.0%
k83592
 
7.1%
I83592
 
7.1%
b83592
 
7.1%
r64849
 
5.5%
s43308
 
3.7%
Other values (8)220199
18.8%
Common
ValueCountFrequency (%)
105246
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII1278582
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
a160887
12.6%
A109357
8.6%
w109357
8.6%
m109357
8.6%
o105246
 
8.2%
105246
 
8.2%
k83592
 
6.5%
I83592
 
6.5%
b83592
 
6.5%
r64849
 
5.1%
Other values (9)263507
20.6%

L.G.A
Categorical

HIGH CORRELATION

Distinct38
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size1.2 MiB
Ikot Ekpene
20133 
Ibiono-Ibom
15075 
Essien Udim
13709 
Ogoja
13219 
Mubi South
9662 
Other values (33)
80754 

Length

Max length11
Median length7
Mean length7.496565106
Min length3

Characters and Unicode

Total characters1143616
Distinct characters42
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowMagama
2nd rowMagama
3rd rowMagama
4th rowMagama
5th rowMagama

Common Values

ValueCountFrequency (%)
Ikot Ekpene20133
13.2%
Ibiono-Ibom15075
 
9.9%
Essien Udim13709
 
9.0%
Ogoja13219
 
8.7%
Mubi South9662
 
6.3%
Abak7992
 
5.2%
Etim Ekpo7702
 
5.0%
Itu6583
 
4.3%
Numan4697
 
3.1%
Obudu4666
 
3.1%
Other values (28)49114
32.2%

Length

2021-06-15T08:50:00.609780image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
ekpene20133
 
9.7%
ikot20133
 
9.7%
ibiono-ibom15075
 
7.3%
udim13709
 
6.6%
essien13709
 
6.6%
ogoja13219
 
6.4%
south9662
 
4.7%
mubi9662
 
4.7%
abak7992
 
3.9%
etim7702
 
3.7%
Other values (33)76232
36.8%

Most occurring characters

ValueCountFrequency (%)
o129005
 
11.3%
i85560
 
7.5%
k74107
 
6.5%
n72863
 
6.4%
a68207
 
6.0%
I65794
 
5.8%
b57987
 
5.1%
e54903
 
4.8%
54676
 
4.8%
t50669
 
4.4%
Other values (32)429845
37.6%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter851562
74.5%
Uppercase Letter222303
 
19.4%
Space Separator54676
 
4.8%
Dash Punctuation15075
 
1.3%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
o129005
15.1%
i85560
10.0%
k74107
8.7%
n72863
8.6%
a68207
8.0%
b57987
 
6.8%
e54903
 
6.4%
t50669
 
6.0%
u49055
 
5.8%
m42238
 
5.0%
Other values (13)166968
19.6%
Uppercase Letter
ValueCountFrequency (%)
I65794
29.6%
E49246
22.2%
O23402
 
10.5%
M17143
 
7.7%
S15544
 
7.0%
U13709
 
6.2%
A11462
 
5.2%
B6221
 
2.8%
N4697
 
2.1%
K3119
 
1.4%
Other values (7)11966
 
5.4%
Space Separator
ValueCountFrequency (%)
54676
100.0%
Dash Punctuation
ValueCountFrequency (%)
-15075
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin1073865
93.9%
Common69751
 
6.1%

Most frequent character per script

Latin
ValueCountFrequency (%)
o129005
 
12.0%
i85560
 
8.0%
k74107
 
6.9%
n72863
 
6.8%
a68207
 
6.4%
I65794
 
6.1%
b57987
 
5.4%
e54903
 
5.1%
t50669
 
4.7%
E49246
 
4.6%
Other values (30)365524
34.0%
Common
ValueCountFrequency (%)
54676
78.4%
-15075
 
21.6%

Most occurring blocks

ValueCountFrequency (%)
ASCII1143616
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
o129005
 
11.3%
i85560
 
7.5%
k74107
 
6.5%
n72863
 
6.4%
a68207
 
6.0%
I65794
 
5.8%
b57987
 
5.1%
e54903
 
4.8%
54676
 
4.8%
t50669
 
4.4%
Other values (32)429845
37.6%

Facility Name
Categorical

HIGH CARDINALITY

Distinct101
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size1.2 MiB
Ibiono Handmaids Hospital
12795 
Ikot Ekpene General Hospital
11267 
Mubi General Hospital
 
9662
Ukana Cottage Hospital
 
8090
Ogoja General Hospital
 
7090
Other values (96)
103648 

Length

Max length48
Median length25
Mean length25.8906799
Min length8

Characters and Unicode

Total characters3949675
Distinct characters55
Distinct categories8 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowRural Hosp- Auna
2nd rowRural Hosp- Auna
3rd rowRural Hosp- Auna
4th rowRural Hosp- Auna
5th rowRural Hosp- Auna

Common Values

ValueCountFrequency (%)
Ibiono Handmaids Hospital12795
 
8.4%
Ikot Ekpene General Hospital11267
 
7.4%
Mubi General Hospital9662
 
6.3%
Ukana Cottage Hospital8090
 
5.3%
Ogoja General Hospital7090
 
4.6%
Etim Ekpo General Hospital6884
 
4.5%
Ogoja Catholic Maternity Hospital5596
 
3.7%
Numan General Hospital4697
 
3.1%
Ikot Ekpene Primary Health Centre4676
 
3.1%
Ukpom Abak General Hospital4399
 
2.9%
Other values (91)77396
50.7%

Length

2021-06-15T08:50:00.915213image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
hospital117324
20.9%
general71208
 
12.7%
ikot23311
 
4.2%
ekpene19492
 
3.5%
health19145
 
3.4%
centre15173
 
2.7%
cottage14532
 
2.6%
ibiono14161
 
2.5%
ogoja13219
 
2.4%
handmaids12795
 
2.3%
Other values (155)241014
42.9%

Most occurring characters

ValueCountFrequency (%)
a411318
 
10.4%
408822
 
10.4%
e337930
 
8.6%
t282073
 
7.1%
o281160
 
7.1%
i250198
 
6.3%
l238516
 
6.0%
n216544
 
5.5%
s168965
 
4.3%
p167526
 
4.2%
Other values (45)1186623
30.0%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter2971522
75.2%
Uppercase Letter554469
 
14.0%
Space Separator408822
 
10.4%
Dash Punctuation9473
 
0.2%
Other Punctuation4985
 
0.1%
Decimal Number272
 
< 0.1%
Open Punctuation66
 
< 0.1%
Close Punctuation66
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
a411318
13.8%
e337930
11.4%
t282073
9.5%
o281160
9.5%
i250198
8.4%
l238516
8.0%
n216544
7.3%
s168965
 
5.7%
p167526
 
5.6%
r156300
 
5.3%
Other values (14)460992
15.5%
Uppercase Letter
ValueCountFrequency (%)
H162021
29.2%
G75615
13.6%
I60748
 
11.0%
C59004
 
10.6%
E39709
 
7.2%
M35923
 
6.5%
U20218
 
3.6%
O17088
 
3.1%
S15233
 
2.7%
N14514
 
2.6%
Other values (13)54396
 
9.8%
Other Punctuation
ValueCountFrequency (%)
.3722
74.7%
'1209
 
24.3%
&54
 
1.1%
Space Separator
ValueCountFrequency (%)
408822
100.0%
Dash Punctuation
ValueCountFrequency (%)
-9473
100.0%
Open Punctuation
ValueCountFrequency (%)
(66
100.0%
Close Punctuation
ValueCountFrequency (%)
)66
100.0%
Decimal Number
ValueCountFrequency (%)
1272
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin3525991
89.3%
Common423684
 
10.7%

Most frequent character per script

Latin
ValueCountFrequency (%)
a411318
11.7%
e337930
 
9.6%
t282073
 
8.0%
o281160
 
8.0%
i250198
 
7.1%
l238516
 
6.8%
n216544
 
6.1%
s168965
 
4.8%
p167526
 
4.8%
H162021
 
4.6%
Other values (37)1009740
28.6%
Common
ValueCountFrequency (%)
408822
96.5%
-9473
 
2.2%
.3722
 
0.9%
'1209
 
0.3%
1272
 
0.1%
(66
 
< 0.1%
)66
 
< 0.1%
&54
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII3949675
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
a411318
 
10.4%
408822
 
10.4%
e337930
 
8.6%
t282073
 
7.1%
o281160
 
7.1%
i250198
 
6.3%
l238516
 
6.0%
n216544
 
5.5%
s168965
 
4.3%
p167526
 
4.2%
Other values (45)1186623
30.0%

PATIENT_ID
Real number (ℝ≥0)

HIGH CORRELATION
UNIFORM
UNIQUE

Distinct152552
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean84518.89607
Minimum8217
Maximum160854
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size1.2 MiB
2021-06-15T08:50:01.055280image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum8217
5-th percentile15844.55
Q146354.75
median84492.5
Q3122705.25
95-th percentile153225.45
Maximum160854
Range152637
Interquartile range (IQR)76350.5

Descriptive statistics

Standard deviation44069.05203
Coefficient of variation (CV)0.521410644
Kurtosis-1.200266362
Mean84518.89607
Median Absolute Deviation (MAD)38175.5
Skewness0.0008710833222
Sum1.289352663 × 1010
Variance1942081347
MonotonicityStrictly increasing
2021-06-15T08:50:01.188541image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
102451
 
< 0.1%
171171
 
< 0.1%
212151
 
< 0.1%
1092801
 
< 0.1%
1072331
 
< 0.1%
1133781
 
< 0.1%
1113311
 
< 0.1%
1010921
 
< 0.1%
990451
 
< 0.1%
1051901
 
< 0.1%
Other values (152542)152542
> 99.9%
ValueCountFrequency (%)
82171
< 0.1%
82181
< 0.1%
82191
< 0.1%
82201
< 0.1%
82211
< 0.1%
82221
< 0.1%
82231
< 0.1%
82241
< 0.1%
82251
< 0.1%
82261
< 0.1%
ValueCountFrequency (%)
1608541
< 0.1%
1608531
< 0.1%
1608521
< 0.1%
1608511
< 0.1%
1608501
< 0.1%
1608491
< 0.1%
1608481
< 0.1%
1608471
< 0.1%
1608461
< 0.1%
1608451
< 0.1%

FACILITY_ID
Real number (ℝ≥0)

HIGH CORRELATION

Distinct101
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2056.288918
Minimum421
Maximum10026
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size1.2 MiB
2021-06-15T08:50:01.339122image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum421
5-th percentile434
Q1507
median612
Q31753
95-th percentile10022
Maximum10026
Range9605
Interquartile range (IQR)1246

Descriptive statistics

Standard deviation3023.349923
Coefficient of variation (CV)1.470294323
Kurtosis2.758194971
Mean2056.288918
Median Absolute Deviation (MAD)154
Skewness2.088395705
Sum313690987
Variance9140644.755
MonotonicityNot monotonic
2021-06-15T08:50:01.482086image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
56212795
 
8.4%
61211267
 
7.4%
4349662
 
6.3%
5058090
 
5.3%
17537090
 
4.6%
5106884
 
4.5%
17525596
 
3.7%
28814697
 
3.1%
6144676
 
3.1%
4594399
 
2.9%
Other values (91)77396
50.7%
ValueCountFrequency (%)
421433
 
0.3%
4251088
 
0.7%
4261877
 
1.2%
4333133
 
2.1%
4349662
6.3%
4364060
2.7%
44672
 
< 0.1%
448237
 
0.2%
4552497
 
1.6%
458787
 
0.5%
ValueCountFrequency (%)
10026185
 
0.1%
100252513
1.6%
100241822
1.2%
100232054
1.3%
100222315
1.5%
10021227
 
0.1%
1002092
 
0.1%
10019662
 
0.4%
100181172
0.8%
100171199
0.8%

HOSPITAL_NUM
Unsupported

REJECTED
UNSUPPORTED

Missing0
Missing (%)0.0%
Memory size1.2 MiB

UNIQUE_ID
Unsupported

MISSING
REJECTED
UNSUPPORTED

Missing58108
Missing (%)38.1%
Memory size1.2 MiB

GENDER
Categorical

HIGH CORRELATION

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size1.2 MiB
Female
101537 
Male
51015 

Length

Max length6
Median length6
Mean length5.331178877
Min length4

Characters and Unicode

Total characters813282
Distinct characters6
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowFemale
2nd rowMale
3rd rowFemale
4th rowFemale
5th rowFemale

Common Values

ValueCountFrequency (%)
Female101537
66.6%
Male51015
33.4%

Length

2021-06-15T08:50:01.991409image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-06-15T08:50:02.071660image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
ValueCountFrequency (%)
female101537
66.6%
male51015
33.4%

Most occurring characters

ValueCountFrequency (%)
e254089
31.2%
a152552
18.8%
l152552
18.8%
F101537
 
12.5%
m101537
 
12.5%
M51015
 
6.3%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter660730
81.2%
Uppercase Letter152552
 
18.8%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e254089
38.5%
a152552
23.1%
l152552
23.1%
m101537
 
15.4%
Uppercase Letter
ValueCountFrequency (%)
F101537
66.6%
M51015
33.4%

Most occurring scripts

ValueCountFrequency (%)
Latin813282
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
e254089
31.2%
a152552
18.8%
l152552
18.8%
F101537
 
12.5%
m101537
 
12.5%
M51015
 
6.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII813282
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e254089
31.2%
a152552
18.8%
l152552
18.8%
F101537
 
12.5%
m101537
 
12.5%
M51015
 
6.3%

DATE_BIRTH
Unsupported

REJECTED
UNSUPPORTED

Missing0
Missing (%)0.0%
Memory size1.2 MiB

AGE
Real number (ℝ≥0)

Distinct114
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean33.47607373
Minimum0
Maximum1001
Zeros4
Zeros (%)< 0.1%
Negative0
Negative (%)0.0%
Memory size1.2 MiB
2021-06-15T08:50:02.161700image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile15
Q126
median32
Q340
95-th percentile55
Maximum1001
Range1001
Interquartile range (IQR)14

Descriptive statistics

Standard deviation13.85632478
Coefficient of variation (CV)0.4139172618
Kurtosis970.3397768
Mean33.47607373
Median Absolute Deviation (MAD)7
Skewness14.90886752
Sum5106842
Variance191.9977363
MonotonicityNot monotonic
2021-06-15T08:50:02.296364image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
3011961
 
7.8%
358656
 
5.7%
407842
 
5.1%
256775
 
4.4%
326108
 
4.0%
285981
 
3.9%
294924
 
3.2%
274882
 
3.2%
454715
 
3.1%
264439
 
2.9%
Other values (104)86269
56.6%
ValueCountFrequency (%)
04
 
< 0.1%
1825
0.5%
21076
0.7%
3709
0.5%
4666
0.4%
5629
0.4%
6541
0.4%
7501
0.3%
8439
0.3%
9410
 
0.3%
ValueCountFrequency (%)
10011
< 0.1%
9961
< 0.1%
9601
< 0.1%
9351
< 0.1%
9342
< 0.1%
9271
< 0.1%
8141
< 0.1%
2301
< 0.1%
2281
< 0.1%
2121
< 0.1%

age_unit
Categorical

HIGH CORRELATION

Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size1.2 MiB
year(s)
150693 
month(s)
 
1808
day(s)
 
51

Length

Max length8
Median length7
Mean length7.011517384
Min length6

Characters and Unicode

Total characters1069621
Distinct characters13
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowyear(s)
2nd rowyear(s)
3rd rowyear(s)
4th rowyear(s)
5th rowyear(s)

Common Values

ValueCountFrequency (%)
year(s)150693
98.8%
month(s)1808
 
1.2%
day(s)51
 
< 0.1%

Length

2021-06-15T08:50:02.553081image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-06-15T08:50:02.637230image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
ValueCountFrequency (%)
year(s150693
98.8%
month(s1808
 
1.2%
day(s51
 
< 0.1%

Most occurring characters

ValueCountFrequency (%)
(152552
14.3%
s152552
14.3%
)152552
14.3%
y150744
14.1%
a150744
14.1%
e150693
14.1%
r150693
14.1%
m1808
 
0.2%
o1808
 
0.2%
n1808
 
0.2%
Other values (3)3667
 
0.3%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter764517
71.5%
Open Punctuation152552
 
14.3%
Close Punctuation152552
 
14.3%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
s152552
20.0%
y150744
19.7%
a150744
19.7%
e150693
19.7%
r150693
19.7%
m1808
 
0.2%
o1808
 
0.2%
n1808
 
0.2%
t1808
 
0.2%
h1808
 
0.2%
Open Punctuation
ValueCountFrequency (%)
(152552
100.0%
Close Punctuation
ValueCountFrequency (%)
)152552
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin764517
71.5%
Common305104
 
28.5%

Most frequent character per script

Latin
ValueCountFrequency (%)
s152552
20.0%
y150744
19.7%
a150744
19.7%
e150693
19.7%
r150693
19.7%
m1808
 
0.2%
o1808
 
0.2%
n1808
 
0.2%
t1808
 
0.2%
h1808
 
0.2%
Common
ValueCountFrequency (%)
(152552
50.0%
)152552
50.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII1069621
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
(152552
14.3%
s152552
14.3%
)152552
14.3%
y150744
14.1%
a150744
14.1%
e150693
14.1%
r150693
14.1%
m1808
 
0.2%
o1808
 
0.2%
n1808
 
0.2%
Other values (3)3667
 
0.3%

marital_status
Categorical

MISSING

Distinct7
Distinct (%)< 0.1%
Missing4682
Missing (%)3.1%
Memory size1.2 MiB
Married
88258 
Single
47135 
Widowed
 
7175
Separated
 
3062
Divorced
 
1604
Other values (2)
 
636

Length

Max length9
Median length7
Mean length6.737884628
Min length6

Characters and Unicode

Total characters996331
Distinct characters18
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowSingle
2nd rowWidowed
3rd rowDivorced
4th rowSingle
5th rowMarried

Common Values

ValueCountFrequency (%)
Married88258
57.9%
Single47135
30.9%
Widowed7175
 
4.7%
Separated3062
 
2.0%
Divorced1604
 
1.1%
Windowed624
 
0.4%
Seperated12
 
< 0.1%
(Missing)4682
 
3.1%

Length

2021-06-15T08:50:02.825520image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-06-15T08:50:02.907140image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
ValueCountFrequency (%)
married88258
59.7%
single47135
31.9%
widowed7175
 
4.9%
separated3062
 
2.1%
divorced1604
 
1.1%
windowed624
 
0.4%
seperated12
 
< 0.1%

Most occurring characters

ValueCountFrequency (%)
r181194
18.2%
e150956
15.2%
i144796
14.5%
d108534
10.9%
a94394
9.5%
M88258
8.9%
S50209
 
5.0%
n47759
 
4.8%
g47135
 
4.7%
l47135
 
4.7%
Other values (8)35961
 
3.6%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter848461
85.2%
Uppercase Letter147870
 
14.8%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
r181194
21.4%
e150956
17.8%
i144796
17.1%
d108534
12.8%
a94394
11.1%
n47759
 
5.6%
g47135
 
5.6%
l47135
 
5.6%
o9403
 
1.1%
w7799
 
0.9%
Other values (4)9356
 
1.1%
Uppercase Letter
ValueCountFrequency (%)
M88258
59.7%
S50209
34.0%
W7799
 
5.3%
D1604
 
1.1%

Most occurring scripts

ValueCountFrequency (%)
Latin996331
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
r181194
18.2%
e150956
15.2%
i144796
14.5%
d108534
10.9%
a94394
9.5%
M88258
8.9%
S50209
 
5.0%
n47759
 
4.8%
g47135
 
4.7%
l47135
 
4.7%
Other values (8)35961
 
3.6%

Most occurring blocks

ValueCountFrequency (%)
ASCII996331
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
r181194
18.2%
e150956
15.2%
i144796
14.5%
d108534
10.9%
a94394
9.5%
M88258
8.9%
S50209
 
5.0%
n47759
 
4.8%
g47135
 
4.7%
l47135
 
4.7%
Other values (8)35961
 
3.6%

education
Categorical

HIGH CORRELATION
MISSING

Distinct12
Distinct (%)< 0.1%
Missing9007
Missing (%)5.9%
Memory size1.2 MiB
Senior Secondary
64628 
Primary
31531 
None
23427 
Post Secondary
14331 
Quranic
 
4860
Other values (7)
 
4768

Length

Max length35
Median length14
Mean length11.53406946
Min length4

Characters and Unicode

Total characters1655658
Distinct characters30
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowPrimary
2nd rowQuranic
3rd rowQuranic
4th rowPrimary
5th rowQuranic

Common Values

ValueCountFrequency (%)
Senior Secondary64628
42.4%
Primary31531
20.7%
None23427
 
15.4%
Post Secondary14331
 
9.4%
Quranic4860
 
3.2%
Junior Secondary4340
 
2.8%
NONE283
 
0.2%
primary74
 
< 0.1%
QURANIC EDUCATION57
 
< 0.1%
QURANIC EDUCATION,QURANIC EDUCATION9
 
< 0.1%
Other values (2)5
 
< 0.1%
(Missing)9007
 
5.9%

Length

2021-06-15T08:50:03.161407image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
secondary83304
36.7%
senior64631
28.5%
primary31605
 
13.9%
none23710
 
10.4%
post14333
 
6.3%
quranic4926
 
2.2%
junior4340
 
1.9%
education66
 
< 0.1%
education,quranic9
 
< 0.1%
secondary,senior3
 
< 0.1%

Most occurring characters

ValueCountFrequency (%)
r220353
13.3%
o190045
11.5%
n180570
10.9%
e171370
10.4%
S147943
8.9%
a119774
7.2%
y114914
6.9%
i105439
6.4%
c88169
 
5.3%
83384
 
5.0%
Other values (20)233697
14.1%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter1343492
81.1%
Uppercase Letter228768
 
13.8%
Space Separator83384
 
5.0%
Other Punctuation14
 
< 0.1%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
S147943
64.7%
P45866
 
20.0%
N24143
 
10.6%
Q4935
 
2.2%
J4340
 
1.9%
E358
 
0.2%
O358
 
0.2%
U150
 
0.1%
A150
 
0.1%
I150
 
0.1%
Other values (4)375
 
0.2%
Lowercase Letter
ValueCountFrequency (%)
r220353
16.4%
o190045
14.1%
n180570
13.4%
e171370
12.8%
a119774
8.9%
y114914
8.6%
i105439
7.8%
c88169
6.6%
d83309
 
6.2%
m31605
 
2.4%
Other values (4)37944
 
2.8%
Space Separator
ValueCountFrequency (%)
83384
100.0%
Other Punctuation
ValueCountFrequency (%)
,14
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin1572260
95.0%
Common83398
 
5.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
r220353
14.0%
o190045
12.1%
n180570
11.5%
e171370
10.9%
S147943
9.4%
a119774
7.6%
y114914
7.3%
i105439
6.7%
c88169
5.6%
d83309
 
5.3%
Other values (18)150374
9.6%
Common
ValueCountFrequency (%)
83384
> 99.9%
,14
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII1655658
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
r220353
13.3%
o190045
11.5%
n180570
10.9%
e171370
10.4%
S147943
8.9%
a119774
7.2%
y114914
6.9%
i105439
6.4%
c88169
 
5.3%
83384
 
5.0%
Other values (20)233697
14.1%

OCCUPATION
Categorical

HIGH CORRELATION
MISSING

Distinct4
Distinct (%)< 0.1%
Missing6239
Missing (%)4.1%
Memory size1.2 MiB
Unemployed
96065 
Employed
42141 
Student
 
7278
Retired
 
829

Length

Max length10
Median length10
Mean length9.257735129
Min length7

Characters and Unicode

Total characters1354527
Distinct characters16
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowUnemployed
2nd rowEmployed
3rd rowUnemployed
4th rowUnemployed
5th rowUnemployed

Common Values

ValueCountFrequency (%)
Unemployed96065
63.0%
Employed42141
27.6%
Student7278
 
4.8%
Retired829
 
0.5%
(Missing)6239
 
4.1%

Length

2021-06-15T08:50:03.394533image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-06-15T08:50:03.472146image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
ValueCountFrequency (%)
unemployed96065
65.7%
employed42141
28.8%
student7278
 
5.0%
retired829
 
0.6%

Most occurring characters

ValueCountFrequency (%)
e243207
18.0%
d146313
10.8%
m138206
10.2%
p138206
10.2%
l138206
10.2%
o138206
10.2%
y138206
10.2%
n103343
7.6%
U96065
 
7.1%
E42141
 
3.1%
Other values (6)32428
 
2.4%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter1208214
89.2%
Uppercase Letter146313
 
10.8%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e243207
20.1%
d146313
12.1%
m138206
11.4%
p138206
11.4%
l138206
11.4%
o138206
11.4%
y138206
11.4%
n103343
8.6%
t15385
 
1.3%
u7278
 
0.6%
Other values (2)1658
 
0.1%
Uppercase Letter
ValueCountFrequency (%)
U96065
65.7%
E42141
28.8%
S7278
 
5.0%
R829
 
0.6%

Most occurring scripts

ValueCountFrequency (%)
Latin1354527
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
e243207
18.0%
d146313
10.8%
m138206
10.2%
p138206
10.2%
l138206
10.2%
o138206
10.2%
y138206
10.2%
n103343
7.6%
U96065
 
7.1%
E42141
 
3.1%
Other values (6)32428
 
2.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII1354527
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e243207
18.0%
d146313
10.8%
m138206
10.2%
p138206
10.2%
l138206
10.2%
o138206
10.2%
y138206
10.2%
n103343
7.6%
U96065
 
7.1%
E42141
 
3.1%
Other values (6)32428
 
2.4%

STATE
Categorical

HIGH CARDINALITY
HIGH CORRELATION
MISSING

Distinct55
Distinct (%)< 0.1%
Missing2937
Missing (%)1.9%
Memory size1.2 MiB
Akwa Ibom
79917 
Adamawa
24646 
Cross River
19215 
Niger
17953 
NIGER
 
2976
Other values (50)
 
4908

Length

Max length23
Median length9
Mean length8.237476189
Min length3

Characters and Unicode

Total characters1232450
Distinct characters48
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique10 ?
Unique (%)< 0.1%

Sample

1st rowNiger
2nd rowNiger
3rd rowNiger
4th rowNiger
5th rowNiger

Common Values

ValueCountFrequency (%)
Akwa Ibom79917
52.4%
Adamawa24646
 
16.2%
Cross River19215
 
12.6%
Niger17953
 
11.8%
NIGER2976
 
2.0%
Benue2213
 
1.5%
Borno855
 
0.6%
Abia455
 
0.3%
Kebbi370
 
0.2%
Ebonyi356
 
0.2%
Other values (45)659
 
0.4%
(Missing)2937
 
1.9%

Length

2021-06-15T08:50:03.743898image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
akwa79917
32.1%
ibom79917
32.1%
adamawa24648
 
9.9%
niger20929
 
8.4%
cross19215
 
7.7%
river19215
 
7.7%
benue2216
 
0.9%
borno855
 
0.3%
abia456
 
0.2%
kebbi374
 
0.2%
Other values (34)1009
 
0.4%

Most occurring characters

ValueCountFrequency (%)
a154947
12.6%
A105097
8.5%
m104658
8.5%
w104626
8.5%
o101353
 
8.2%
99136
 
8.0%
I82951
 
6.7%
b81601
 
6.6%
k79922
 
6.5%
r57568
 
4.7%
Other values (38)260591
21.1%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter872445
70.8%
Uppercase Letter260866
 
21.2%
Space Separator99136
 
8.0%
Other Punctuation3
 
< 0.1%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
A105097
40.3%
I82951
31.8%
R22354
 
8.6%
N20954
 
8.0%
C19236
 
7.4%
E3394
 
1.3%
B3092
 
1.2%
G3004
 
1.2%
K550
 
0.2%
T106
 
< 0.1%
Other values (14)128
 
< 0.1%
Lowercase Letter
ValueCountFrequency (%)
a154947
17.8%
m104658
12.0%
w104626
12.0%
o101353
11.6%
b81601
9.4%
k79922
9.2%
r57568
 
6.6%
e42157
 
4.8%
s38608
 
4.4%
i38553
 
4.4%
Other values (11)68452
7.8%
Other Punctuation
ValueCountFrequency (%)
,2
66.7%
.1
33.3%
Space Separator
ValueCountFrequency (%)
99136
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin1133311
92.0%
Common99139
 
8.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
a154947
13.7%
A105097
9.3%
m104658
9.2%
w104626
9.2%
o101353
8.9%
I82951
 
7.3%
b81601
 
7.2%
k79922
 
7.1%
r57568
 
5.1%
e42157
 
3.7%
Other values (35)218431
19.3%
Common
ValueCountFrequency (%)
99136
> 99.9%
,2
 
< 0.1%
.1
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII1232450
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
a154947
12.6%
A105097
8.5%
m104658
8.5%
w104626
8.5%
o101353
 
8.2%
99136
 
8.0%
I82951
 
6.7%
b81601
 
6.6%
k79922
 
6.5%
r57568
 
4.7%
Other values (38)260591
21.1%

lga
Categorical

HIGH CARDINALITY
MISSING

Distinct579
Distinct (%)0.4%
Missing2543
Missing (%)1.7%
Memory size1.2 MiB
Ikot Ekpene
12481 
Essien Udim
10670 
Ibiono-Ibom
 
7921
Itu
 
7497
Abak
 
7276
Other values (574)
104164 

Length

Max length56
Median length5
Mean length6.831183462
Min length3

Characters and Unicode

Total characters1024739
Distinct characters63
Distinct categories7 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique278 ?
Unique (%)0.2%

Sample

1st rowMagama
2nd rowMagama
3rd rowMagama
4th rowMagama
5th rowMagama

Common Values

ValueCountFrequency (%)
Ikot Ekpene12481
 
8.2%
Essien Udim10670
 
7.0%
Ibiono-Ibom7921
 
5.2%
Itu7497
 
4.9%
Abak7276
 
4.8%
Ikono7067
 
4.6%
Ogoja5211
 
3.4%
Etim Ekpo4814
 
3.2%
Uyo4626
 
3.0%
Mubi South4592
 
3.0%
Other values (569)77854
51.0%

Length

2021-06-15T08:50:04.046621image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
ikot12615
 
6.5%
ekpene12481
 
6.5%
udim10670
 
5.5%
essien10670
 
5.5%
ibiono-ibom7921
 
4.1%
itu7497
 
3.9%
mubi7280
 
3.8%
abak7276
 
3.8%
ikono7067
 
3.7%
ogoja5211
 
2.7%
Other values (652)104237
54.0%

Most occurring characters

ValueCountFrequency (%)
o101753
 
9.9%
a81905
 
8.0%
i72798
 
7.1%
k71305
 
7.0%
n64391
 
6.3%
I53359
 
5.2%
e45352
 
4.4%
b43085
 
4.2%
42918
 
4.2%
t42511
 
4.1%
Other values (53)405362
39.6%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter756673
73.8%
Uppercase Letter216098
 
21.1%
Space Separator42918
 
4.2%
Dash Punctuation8304
 
0.8%
Other Punctuation731
 
0.1%
Decimal Number14
 
< 0.1%
Open Punctuation1
 
< 0.1%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
I53359
24.7%
E35279
16.3%
O19446
 
9.0%
U18711
 
8.7%
A17333
 
8.0%
M17021
 
7.9%
S10531
 
4.9%
B9359
 
4.3%
N5835
 
2.7%
K4638
 
2.1%
Other values (15)24586
11.4%
Lowercase Letter
ValueCountFrequency (%)
o101753
13.4%
a81905
10.8%
i72798
9.6%
k71305
9.4%
n64391
8.5%
e45352
 
6.0%
b43085
 
5.7%
t42511
 
5.6%
u42402
 
5.6%
m32054
 
4.2%
Other values (14)159117
21.0%
Decimal Number
ValueCountFrequency (%)
15
35.7%
72
 
14.3%
42
 
14.3%
01
 
7.1%
51
 
7.1%
31
 
7.1%
91
 
7.1%
61
 
7.1%
Other Punctuation
ValueCountFrequency (%)
/724
99.0%
,5
 
0.7%
.2
 
0.3%
Space Separator
ValueCountFrequency (%)
42918
100.0%
Dash Punctuation
ValueCountFrequency (%)
-8304
100.0%
Open Punctuation
ValueCountFrequency (%)
(1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin972771
94.9%
Common51968
 
5.1%

Most frequent character per script

Latin
ValueCountFrequency (%)
o101753
 
10.5%
a81905
 
8.4%
i72798
 
7.5%
k71305
 
7.3%
n64391
 
6.6%
I53359
 
5.5%
e45352
 
4.7%
b43085
 
4.4%
t42511
 
4.4%
u42402
 
4.4%
Other values (39)353910
36.4%
Common
ValueCountFrequency (%)
42918
82.6%
-8304
 
16.0%
/724
 
1.4%
,5
 
< 0.1%
15
 
< 0.1%
.2
 
< 0.1%
72
 
< 0.1%
42
 
< 0.1%
(1
 
< 0.1%
01
 
< 0.1%
Other values (4)4
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII1024739
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
o101753
 
9.9%
a81905
 
8.0%
i72798
 
7.1%
k71305
 
7.0%
n64391
 
6.3%
I53359
 
5.2%
e45352
 
4.4%
b43085
 
4.2%
42918
 
4.2%
t42511
 
4.1%
Other values (53)405362
39.6%

entry_point
Categorical

HIGH CORRELATION
MISSING

Distinct13
Distinct (%)< 0.1%
Missing47050
Missing (%)30.8%
Memory size1.2 MiB
HCT
36533 
Outreach
32067 
OPD
17813 
Others
7071 
Transfer-in
4074 
Other values (8)
7944 

Length

Max length11
Median length3
Mean length5.467299198
Min length3

Characters and Unicode

Total characters576811
Distinct characters28
Distinct categories5 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowOPD
2nd rowOPD
3rd rowOPD
4th rowOPD
5th rowOPD

Common Values

ValueCountFrequency (%)
HCT36533
23.9%
Outreach32067
21.0%
OPD17813
 
11.7%
Others7071
 
4.6%
Transfer-in4074
 
2.7%
Outreaches3098
 
2.0%
In-patient1722
 
1.1%
PMTCT1137
 
0.7%
ANC/PMTCT758
 
0.5%
TB DOTS581
 
0.4%
Other values (3)648
 
0.4%
(Missing)47050
30.8%

Length

2021-06-15T08:50:04.302191image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
hct36533
34.3%
outreach32067
30.1%
opd17813
16.7%
others7071
 
6.6%
transfer-in4074
 
3.8%
outreaches3098
 
2.9%
in-patient1722
 
1.6%
pmtct1137
 
1.1%
anc/pmtct758
 
0.7%
tb581
 
0.5%
Other values (6)1651
 
1.6%

Most occurring characters

ValueCountFrequency (%)
O60856
10.6%
e51455
8.9%
r50384
 
8.7%
t46330
 
8.0%
T45656
 
7.9%
h42236
 
7.3%
a41286
 
7.2%
C39509
 
6.8%
H36533
 
6.3%
c35262
 
6.1%
Other values (18)127304
22.1%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter341558
59.2%
Uppercase Letter227696
39.5%
Dash Punctuation5796
 
1.0%
Space Separator1003
 
0.2%
Other Punctuation758
 
0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e51455
15.1%
r50384
14.8%
t46330
13.6%
h42236
12.4%
a41286
12.1%
c35262
10.3%
u35165
10.3%
s14568
 
4.3%
n12339
 
3.6%
i6315
 
1.8%
Other values (3)6218
 
1.8%
Uppercase Letter
ValueCountFrequency (%)
O60856
26.7%
T45656
20.1%
C39509
17.4%
H36533
16.0%
P19708
 
8.7%
D18394
 
8.1%
I2144
 
0.9%
M1895
 
0.8%
B807
 
0.4%
A758
 
0.3%
Other values (2)1436
 
0.6%
Dash Punctuation
ValueCountFrequency (%)
-5796
100.0%
Space Separator
ValueCountFrequency (%)
1003
100.0%
Other Punctuation
ValueCountFrequency (%)
/758
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin569254
98.7%
Common7557
 
1.3%

Most frequent character per script

Latin
ValueCountFrequency (%)
O60856
10.7%
e51455
9.0%
r50384
8.9%
t46330
 
8.1%
T45656
 
8.0%
h42236
 
7.4%
a41286
 
7.3%
C39509
 
6.9%
H36533
 
6.4%
c35262
 
6.2%
Other values (15)119747
21.0%
Common
ValueCountFrequency (%)
-5796
76.7%
1003
 
13.3%
/758
 
10.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII576811
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
O60856
10.6%
e51455
8.9%
r50384
 
8.7%
t46330
 
8.0%
T45656
 
7.9%
h42236
 
7.3%
a41286
 
7.2%
C39509
 
6.8%
H36533
 
6.3%
c35262
 
6.1%
Other values (18)127304
22.1%

DATE_CONFIRMED_HIV
Unsupported

MISSING
REJECTED
UNSUPPORTED

Missing51946
Missing (%)34.1%
Memory size1.2 MiB
Distinct513
Distinct (%)48.2%
Missing151488
Missing (%)99.3%
Memory size1.2 MiB
Minimum2007-02-21 00:00:00
Maximum2021-05-28 00:00:00
2021-06-15T08:50:04.434647image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-06-15T08:50:04.567709image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)

SOURCE_REFERRAL
Categorical

HIGH CORRELATION
MISSING

Distinct8
Distinct (%)2.1%
Missing152177
Missing (%)99.8%
Memory size1.2 MiB
Self-referral
209 
PMTCT outreach
69 
In-patients
40 
Private/Commercial Health facility
34 
Medical outpatient
 
18
Other values (3)
 
5

Length

Max length34
Median length13
Mean length15.14666667
Min length1

Characters and Unicode

Total characters5680
Distinct characters33
Distinct categories6 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique2 ?
Unique (%)0.5%

Sample

1st rowPMTCT outreach
2nd rowPMTCT outreach
3rd rowIn-patients
4th rowIn-patients
5th rowIn-patients

Common Values

ValueCountFrequency (%)
Self-referral209
 
0.1%
PMTCT outreach69
 
< 0.1%
In-patients40
 
< 0.1%
Private/Commercial Health facility34
 
< 0.1%
Medical outpatient18
 
< 0.1%
External HCT centre3
 
< 0.1%
Sex worker outreach1
 
< 0.1%
1
 
< 0.1%
(Missing)152177
99.8%

Length

2021-06-15T08:50:04.811105image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-06-15T08:50:04.898749image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
ValueCountFrequency (%)
self-referral209
38.8%
outreach70
 
13.0%
pmtct69
 
12.8%
in-patients40
 
7.4%
private/commercial34
 
6.3%
facility34
 
6.3%
health34
 
6.3%
outpatient18
 
3.3%
medical18
 
3.3%
centre3
 
0.6%
Other values (5)9
 
1.7%

Most occurring characters

ValueCountFrequency (%)
e886
15.6%
r773
13.6%
l541
 
9.5%
a494
 
8.7%
f452
 
8.0%
t312
 
5.5%
-249
 
4.4%
i212
 
3.7%
S210
 
3.7%
163
 
2.9%
Other values (23)1388
24.4%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter4506
79.3%
Uppercase Letter727
 
12.8%
Dash Punctuation249
 
4.4%
Space Separator163
 
2.9%
Other Punctuation34
 
0.6%
Control1
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e886
19.7%
r773
17.2%
l541
12.0%
a494
11.0%
f452
10.0%
t312
 
6.9%
i212
 
4.7%
c159
 
3.5%
o123
 
2.7%
h104
 
2.3%
Other values (11)450
10.0%
Uppercase Letter
ValueCountFrequency (%)
S210
28.9%
T141
19.4%
C106
14.6%
P103
14.2%
M87
12.0%
I40
 
5.5%
H37
 
5.1%
E3
 
0.4%
Space Separator
ValueCountFrequency (%)
163
100.0%
Dash Punctuation
ValueCountFrequency (%)
-249
100.0%
Control
ValueCountFrequency (%)
1
100.0%
Other Punctuation
ValueCountFrequency (%)
/34
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin5233
92.1%
Common447
 
7.9%

Most frequent character per script

Latin
ValueCountFrequency (%)
e886
16.9%
r773
14.8%
l541
10.3%
a494
9.4%
f452
8.6%
t312
 
6.0%
i212
 
4.1%
S210
 
4.0%
c159
 
3.0%
T141
 
2.7%
Other values (19)1053
20.1%
Common
ValueCountFrequency (%)
-249
55.7%
163
36.5%
/34
 
7.6%
1
 
0.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII5680
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e886
15.6%
r773
13.6%
l541
 
9.5%
a494
 
8.7%
f452
 
8.0%
t312
 
5.5%
-249
 
4.4%
i212
 
3.7%
S210
 
3.7%
163
 
2.9%
Other values (23)1388
24.4%

TIME_HIV_DIAGNOSIS
Categorical

HIGH CORRELATION
MISSING

Distinct12
Distinct (%)2.7%
Missing152100
Missing (%)99.7%
Memory size1.2 MiB
ANC
229 
Newly Tested HIV+ (ANC)
83 
Previously known HIV+ (ANC)
72 
Previous known HIV+ (ANC)
26 
Previous pregnancy (ANC)
 
18
Other values (7)
24 

Length

Max length33
Median length3
Mean length13.5420354
Min length3

Characters and Unicode

Total characters6121
Distinct characters39
Distinct categories9 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique2 ?
Unique (%)0.4%

Sample

1st rowANC
2nd rowANC
3rd rowPrevious pregnancy (ANC)
4th rowANC
5th rowANC

Common Values

ValueCountFrequency (%)
ANC229
 
0.2%
Newly Tested HIV+ (ANC)83
 
0.1%
Previously known HIV+ (ANC)72
 
< 0.1%
Previous known HIV+ (ANC)26
 
< 0.1%
Previous pregnancy (ANC)18
 
< 0.1%
Previous - Non pregnant8
 
< 0.1%
Labour6
 
< 0.1%
Previously konwn HIV+ (PP >72hrs)3
 
< 0.1%
Previous pregnancy (L&D)3
 
< 0.1%
Newly Tested HIV+ (PP >72hrs)2
 
< 0.1%
Other values (2)2
 
< 0.1%
(Missing)152100
99.7%

Length

2021-06-15T08:50:05.163395image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
anc428
39.4%
hiv188
17.3%
known99
 
9.1%
tested86
 
7.9%
newly86
 
7.9%
previously76
 
7.0%
previous55
 
5.1%
pregnancy21
 
1.9%
8
 
0.7%
non8
 
0.7%
Other values (6)32
 
2.9%

Most occurring characters

ValueCountFrequency (%)
635
 
10.4%
N522
 
8.5%
A428
 
7.0%
C428
 
7.0%
e418
 
6.8%
n270
 
4.4%
o247
 
4.0%
s222
 
3.6%
(209
 
3.4%
)209
 
3.4%
Other values (29)2533
41.4%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter2667
43.6%
Uppercase Letter2185
35.7%
Space Separator635
 
10.4%
Open Punctuation209
 
3.4%
Close Punctuation209
 
3.4%
Math Symbol193
 
3.2%
Decimal Number10
 
0.2%
Dash Punctuation8
 
0.1%
Other Punctuation5
 
0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e418
15.7%
n270
10.1%
o247
9.3%
s222
8.3%
w188
 
7.0%
y183
 
6.9%
r171
 
6.4%
l162
 
6.1%
u137
 
5.1%
v131
 
4.9%
Other values (10)538
20.2%
Uppercase Letter
ValueCountFrequency (%)
N522
23.9%
A428
19.6%
C428
19.6%
H188
 
8.6%
I188
 
8.6%
V188
 
8.6%
P141
 
6.5%
T86
 
3.9%
L11
 
0.5%
D5
 
0.2%
Math Symbol
ValueCountFrequency (%)
+188
97.4%
>5
 
2.6%
Decimal Number
ValueCountFrequency (%)
75
50.0%
25
50.0%
Space Separator
ValueCountFrequency (%)
635
100.0%
Open Punctuation
ValueCountFrequency (%)
(209
100.0%
Close Punctuation
ValueCountFrequency (%)
)209
100.0%
Other Punctuation
ValueCountFrequency (%)
&5
100.0%
Dash Punctuation
ValueCountFrequency (%)
-8
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin4852
79.3%
Common1269
 
20.7%

Most frequent character per script

Latin
ValueCountFrequency (%)
N522
 
10.8%
A428
 
8.8%
C428
 
8.8%
e418
 
8.6%
n270
 
5.6%
o247
 
5.1%
s222
 
4.6%
w188
 
3.9%
H188
 
3.9%
I188
 
3.9%
Other values (20)1753
36.1%
Common
ValueCountFrequency (%)
635
50.0%
(209
 
16.5%
)209
 
16.5%
+188
 
14.8%
-8
 
0.6%
&5
 
0.4%
>5
 
0.4%
75
 
0.4%
25
 
0.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII6121
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
635
 
10.4%
N522
 
8.5%
A428
 
7.0%
C428
 
7.0%
e418
 
6.8%
n270
 
4.4%
o247
 
4.0%
s222
 
3.6%
(209
 
3.4%
)209
 
3.4%
Other values (29)2533
41.4%

tb_status
Categorical

HIGH CORRELATION
MISSING

Distinct23
Distinct (%)< 0.1%
Missing51879
Missing (%)34.0%
Memory size1.2 MiB
No sign or symptoms of TB
91713 
Currently on INH prophylaxis
 
5472
No signs or symptoms of TB
 
1790
TB suspected and referred for evaluation
 
1192
Currently on TB treatment
 
215
Other values (18)
 
291

Length

Max length129
Median length25
Mean length25.40309716
Min length21

Characters and Unicode

Total characters2557406
Distinct characters40
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique7 ?
Unique (%)< 0.1%

Sample

1st rowNo sign or symptoms of TB
2nd rowNo sign or symptoms of TB
3rd rowCurrently on INH prophylaxis
4th rowNo sign or symptoms of TB
5th rowCurrently on INH prophylaxis

Common Values

ValueCountFrequency (%)
No sign or symptoms of TB91713
60.1%
Currently on INH prophylaxis5472
 
3.6%
No signs or symptoms of TB1790
 
1.2%
TB suspected and referred for evaluation1192
 
0.8%
Currently on TB treatment215
 
0.1%
Patient with signs and symptoms of TB182
 
0.1%
No signs or symptoms of TB,No signs or symptoms of TB39
 
< 0.1%
TB positive not on TB drugs15
 
< 0.1%
No signs or symptoms of TB,Currently on INH prophylaxis12
 
< 0.1%
TB Positive not on drugs12
 
< 0.1%
Other values (13)31
 
< 0.1%
(Missing)51879
34.0%

Length

2021-06-15T08:50:05.404494image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
tb95192
16.0%
symptoms93802
15.8%
of93802
15.8%
or93605
15.8%
no93556
15.8%
sign91713
15.5%
on5737
 
1.0%
currently5693
 
1.0%
inh5499
 
0.9%
prophylaxis5494
 
0.9%
Other values (28)9047
 
1.5%

Most occurring characters

ValueCountFrequency (%)
492467
19.3%
o388518
15.2%
s291442
11.4%
m187824
 
7.3%
r115553
 
4.5%
n108292
 
4.2%
p106000
 
4.1%
y105007
 
4.1%
t103223
 
4.0%
i100971
 
3.9%
Other values (30)558109
21.8%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter1758274
68.8%
Space Separator492467
 
19.3%
Uppercase Letter306593
 
12.0%
Other Punctuation72
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
o388518
22.1%
s291442
16.6%
m187824
10.7%
r115553
 
6.6%
n108292
 
6.2%
p106000
 
6.0%
y105007
 
6.0%
t103223
 
5.9%
i100971
 
5.7%
f96195
 
5.5%
Other values (11)155249
 
8.8%
Uppercase Letter
ValueCountFrequency (%)
N99106
32.3%
T95266
31.1%
B95260
31.1%
C5717
 
1.9%
I5500
 
1.8%
H5499
 
1.8%
P226
 
0.1%
S4
 
< 0.1%
E3
 
< 0.1%
F3
 
< 0.1%
Other values (7)9
 
< 0.1%
Space Separator
ValueCountFrequency (%)
492467
100.0%
Other Punctuation
ValueCountFrequency (%)
,72
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin2064867
80.7%
Common492539
 
19.3%

Most frequent character per script

Latin
ValueCountFrequency (%)
o388518
18.8%
s291442
14.1%
m187824
9.1%
r115553
 
5.6%
n108292
 
5.2%
p106000
 
5.1%
y105007
 
5.1%
t103223
 
5.0%
i100971
 
4.9%
N99106
 
4.8%
Other values (28)458931
22.2%
Common
ValueCountFrequency (%)
492467
> 99.9%
,72
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII2557406
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
492467
19.3%
o388518
15.2%
s291442
11.4%
m187824
 
7.3%
r115553
 
4.5%
n108292
 
4.2%
p106000
 
4.1%
y105007
 
4.1%
t103223
 
4.0%
i100971
 
3.9%
Other values (30)558109
21.8%

PREGNANT
Categorical

HIGH CORRELATION

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size1.2 MiB
0
151345 
1
 
1207

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters152552
Distinct characters2
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0
2nd row0
3rd row0
4th row0
5th row0

Common Values

ValueCountFrequency (%)
0151345
99.2%
11207
 
0.8%

Length

2021-06-15T08:50:05.629274image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-06-15T08:50:05.702368image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
ValueCountFrequency (%)
0151345
99.2%
11207
 
0.8%

Most occurring characters

ValueCountFrequency (%)
0151345
99.2%
11207
 
0.8%

Most occurring categories

ValueCountFrequency (%)
Decimal Number152552
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0151345
99.2%
11207
 
0.8%

Most occurring scripts

ValueCountFrequency (%)
Common152552
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0151345
99.2%
11207
 
0.8%

Most occurring blocks

ValueCountFrequency (%)
ASCII152552
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0151345
99.2%
11207
 
0.8%

BREASTFEEDING
Categorical

HIGH CORRELATION

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size1.2 MiB
0
151825 
1
 
727

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters152552
Distinct characters2
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0
2nd row0
3rd row0
4th row0
5th row0

Common Values

ValueCountFrequency (%)
0151825
99.5%
1727
 
0.5%

Length

2021-06-15T08:50:05.892178image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-06-15T08:50:05.965132image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
ValueCountFrequency (%)
0151825
99.5%
1727
 
0.5%

Most occurring characters

ValueCountFrequency (%)
0151825
99.5%
1727
 
0.5%

Most occurring categories

ValueCountFrequency (%)
Decimal Number152552
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0151825
99.5%
1727
 
0.5%

Most occurring scripts

ValueCountFrequency (%)
Common152552
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0151825
99.5%
1727
 
0.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII152552
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0151825
99.5%
1727
 
0.5%

DATE_REGISTRATION
Unsupported

REJECTED
UNSUPPORTED

Missing141
Missing (%)0.1%
Memory size1.2 MiB

STATUS_REGISTRATION
Categorical

HIGH CORRELATION

Distinct10
Distinct (%)< 0.1%
Missing940
Missing (%)0.6%
Memory size1.2 MiB
HIV+ Non ART
83144 
HIV+ non ART
61106 
ART Transfer In
 
6296
Pre-ART Transfer In
 
413
HIV exposed Infant status unknown
 
308
Other values (5)
 
345

Length

Max length34
Median length12
Mean length12.21394744
Min length12

Characters and Unicode

Total characters1851781
Distinct characters32
Distinct categories5 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowHIV+ Non ART
2nd rowHIV+ Non ART
3rd rowHIV+ Non ART
4th rowHIV+ Non ART
5th rowHIV+ Non ART

Common Values

ValueCountFrequency (%)
HIV+ Non ART83144
54.5%
HIV+ non ART61106
40.1%
ART Transfer In6296
 
4.1%
Pre-ART Transfer In413
 
0.3%
HIV exposed Infant status unknown308
 
0.2%
HIV exposed status unknown170
 
0.1%
HIV Exposed Status Unknown91
 
0.1%
ART Start - external56
 
< 0.1%
HIV negative24
 
< 0.1%
HIV exposed Infant status negative4
 
< 0.1%
(Missing)940
 
0.6%

Length

2021-06-15T08:50:06.156348image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-06-15T08:50:06.242513image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
ValueCountFrequency (%)
art150602
33.0%
hiv144847
31.8%
non144250
31.7%
transfer6709
 
1.5%
in6709
 
1.5%
status573
 
0.1%
exposed573
 
0.1%
unknown569
 
0.1%
pre-art413
 
0.1%
infant312
 
0.1%
Other values (4)196
 
< 0.1%

Most occurring characters

ValueCountFrequency (%)
304141
16.4%
n221189
11.9%
T157724
8.5%
I151868
8.2%
A151015
8.2%
R151015
8.2%
o145392
7.9%
H144847
7.8%
V144847
7.8%
+144250
7.8%
Other values (22)135493
7.3%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter985202
53.2%
Lowercase Letter417719
22.6%
Space Separator304141
 
16.4%
Math Symbol144250
 
7.8%
Dash Punctuation469
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
n221189
53.0%
o145392
34.8%
r13943
 
3.3%
e8345
 
2.0%
s8337
 
2.0%
a7734
 
1.9%
f7021
 
1.7%
t1654
 
0.4%
u1051
 
0.3%
x629
 
0.2%
Other values (8)2424
 
0.6%
Uppercase Letter
ValueCountFrequency (%)
T157724
16.0%
I151868
15.4%
A151015
15.3%
R151015
15.3%
H144847
14.7%
V144847
14.7%
N83144
8.4%
P413
 
< 0.1%
S147
 
< 0.1%
E91
 
< 0.1%
Math Symbol
ValueCountFrequency (%)
+144250
100.0%
Space Separator
ValueCountFrequency (%)
304141
100.0%
Dash Punctuation
ValueCountFrequency (%)
-469
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin1402921
75.8%
Common448860
 
24.2%

Most frequent character per script

Latin
ValueCountFrequency (%)
n221189
15.8%
T157724
11.2%
I151868
10.8%
A151015
10.8%
R151015
10.8%
o145392
10.4%
H144847
10.3%
V144847
10.3%
N83144
 
5.9%
r13943
 
1.0%
Other values (19)37937
 
2.7%
Common
ValueCountFrequency (%)
304141
67.8%
+144250
32.1%
-469
 
0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII1851781
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
304141
16.4%
n221189
11.9%
T157724
8.5%
I151868
8.2%
A151015
8.2%
R151015
8.2%
o145392
7.9%
H144847
7.8%
V144847
7.8%
+144250
7.8%
Other values (22)135493
7.3%

ENROLLMENT_SETTING
Categorical

HIGH CORRELATION
MISSING

Distinct8
Distinct (%)< 0.1%
Missing35250
Missing (%)23.1%
Memory size1.2 MiB
Facility
81783 
Community
29928 
Clinical Platforms (Chemists/PMVs/Dispensary)
 
4974
Clinical Platforms (PHCs/Private Clinics/Nursing Homes)
 
411
Community Based Organisation
 
160
Other values (3)
 
46

Length

Max length55
Median length8
Mean length10.02573699
Min length8

Characters and Unicode

Total characters1176039
Distinct characters37
Distinct categories6 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1 ?
Unique (%)< 0.1%

Sample

1st rowFacility
2nd rowFacility
3rd rowFacility
4th rowFacility
5th rowFacility

Common Values

ValueCountFrequency (%)
Facility81783
53.6%
Community29928
 
19.6%
Clinical Platforms (Chemists/PMVs/Dispensary)4974
 
3.3%
Clinical Platforms (PHCs/Private Clinics/Nursing Homes)411
 
0.3%
Community Based Organisation160
 
0.1%
Clinical Platforms (Laboratories)43
 
< 0.1%
Clinical Platforms (TBAs)2
 
< 0.1%
Clinical Platforms (Community Pharmacy)1
 
< 0.1%
(Missing)35250
23.1%

Length

2021-06-15T08:50:06.875273image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-06-15T08:50:06.955281image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
ValueCountFrequency (%)
facility81783
63.2%
community30089
 
23.3%
platforms5431
 
4.2%
clinical5431
 
4.2%
chemists/pmvs/dispensary4974
 
3.8%
homes411
 
0.3%
phcs/private411
 
0.3%
clinics/nursing411
 
0.3%
based160
 
0.1%
organisation160
 
0.1%
Other values (3)46
 
< 0.1%

Most occurring characters

ValueCountFrequency (%)
i216472
18.4%
t122891
10.4%
y116847
9.9%
a98598
8.4%
l98487
8.4%
c87626
7.5%
F81783
 
7.0%
m70995
 
6.0%
n41636
 
3.5%
C41316
 
3.5%
Other values (27)199388
17.0%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter991551
84.3%
Uppercase Letter150851
 
12.8%
Space Separator12005
 
1.0%
Other Punctuation10770
 
0.9%
Open Punctuation5431
 
0.5%
Close Punctuation5431
 
0.5%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
i216472
21.8%
t122891
12.4%
y116847
11.8%
a98598
9.9%
l98487
9.9%
c87626
8.8%
m70995
 
7.2%
n41636
 
4.2%
o36177
 
3.6%
s32310
 
3.3%
Other values (10)69512
 
7.0%
Uppercase Letter
ValueCountFrequency (%)
F81783
54.2%
C41316
27.4%
P11228
 
7.4%
M4974
 
3.3%
V4974
 
3.3%
D4974
 
3.3%
H822
 
0.5%
N411
 
0.3%
B162
 
0.1%
O160
 
0.1%
Other values (3)47
 
< 0.1%
Space Separator
ValueCountFrequency (%)
12005
100.0%
Open Punctuation
ValueCountFrequency (%)
(5431
100.0%
Other Punctuation
ValueCountFrequency (%)
/10770
100.0%
Close Punctuation
ValueCountFrequency (%)
)5431
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin1142402
97.1%
Common33637
 
2.9%

Most frequent character per script

Latin
ValueCountFrequency (%)
i216472
18.9%
t122891
10.8%
y116847
10.2%
a98598
8.6%
l98487
8.6%
c87626
7.7%
F81783
 
7.2%
m70995
 
6.2%
n41636
 
3.6%
C41316
 
3.6%
Other values (23)165751
14.5%
Common
ValueCountFrequency (%)
12005
35.7%
/10770
32.0%
(5431
16.1%
)5431
16.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII1176039
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
i216472
18.4%
t122891
10.4%
y116847
9.9%
a98598
8.4%
l98487
8.4%
c87626
7.5%
F81783
 
7.0%
m70995
 
6.0%
n41636
 
3.5%
C41316
 
3.5%
Other values (27)199388
17.0%

CBO_ID
Real number (ℝ≥0)

HIGH CORRELATION
MISSING
SKEWED
ZEROS

Distinct7
Distinct (%)< 0.1%
Missing14321
Missing (%)9.4%
Infinite0
Infinite (%)0.0%
Mean0.004644399592
Minimum0
Maximum6
Zeros138071
Zeros (%)90.5%
Negative0
Negative (%)0.0%
Memory size1.2 MiB
2021-06-15T08:50:07.094007image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile0
Maximum6
Range6
Interquartile range (IQR)0

Descriptive statistics

Standard deviation0.1461116338
Coefficient of variation (CV)31.4597465
Kurtosis1197.346888
Mean0.004644399592
Median Absolute Deviation (MAD)0
Skewness33.81683773
Sum642
Variance0.02134860953
MonotonicityNot monotonic
2021-06-15T08:50:07.184458image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=7)
ValueCountFrequency (%)
0138071
90.5%
440
 
< 0.1%
334
 
< 0.1%
634
 
< 0.1%
530
 
< 0.1%
118
 
< 0.1%
24
 
< 0.1%
(Missing)14321
 
9.4%
ValueCountFrequency (%)
0138071
90.5%
118
 
< 0.1%
24
 
< 0.1%
334
 
< 0.1%
440
 
< 0.1%
530
 
< 0.1%
634
 
< 0.1%
ValueCountFrequency (%)
634
 
< 0.1%
530
 
< 0.1%
440
 
< 0.1%
334
 
< 0.1%
24
 
< 0.1%
118
 
< 0.1%
0138071
90.5%

DATE_STARTED
Unsupported

MISSING
REJECTED
UNSUPPORTED

Missing23198
Missing (%)15.2%
Memory size1.2 MiB

enrolled_ovc
Categorical

MISSING

Distinct2
Distinct (%)< 0.1%
Missing11229
Missing (%)7.4%
Memory size1.2 MiB
0.0
141301 
1.0
 
22

Length

Max length3
Median length3
Mean length3
Min length3

Characters and Unicode

Total characters423969
Distinct characters3
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0.0
2nd row0.0
3rd row0.0
4th row0.0
5th row0.0

Common Values

ValueCountFrequency (%)
0.0141301
92.6%
1.022
 
< 0.1%
(Missing)11229
 
7.4%

Length

2021-06-15T08:50:07.409526image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-06-15T08:50:07.481408image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
ValueCountFrequency (%)
0.0141301
> 99.9%
1.022
 
< 0.1%

Most occurring characters

ValueCountFrequency (%)
0282624
66.7%
.141323
33.3%
122
 
< 0.1%

Most occurring categories

ValueCountFrequency (%)
Decimal Number282646
66.7%
Other Punctuation141323
33.3%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0282624
> 99.9%
122
 
< 0.1%
Other Punctuation
ValueCountFrequency (%)
.141323
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common423969
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0282624
66.7%
.141323
33.3%
122
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII423969
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0282624
66.7%
.141323
33.3%
122
 
< 0.1%

RECENCY_CONSENT
Boolean

HIGH CORRELATION
MISSING

Distinct2
Distinct (%)< 0.1%
Missing144524
Missing (%)94.7%
Memory size298.1 KiB
False
 
5933
True
 
2095
(Missing)
144524 
ValueCountFrequency (%)
False5933
 
3.9%
True2095
 
1.4%
(Missing)144524
94.7%
2021-06-15T08:50:07.523013image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

RECENCY_TESTING
Categorical

HIGH CORRELATION
MISSING

Distinct3
Distinct (%)< 0.1%
Missing14771
Missing (%)9.7%
Memory size1.2 MiB
No Documented Test Result
135095 
Long Term Infection
 
1987
Recent Infection
 
699

Length

Max length25
Median length25
Mean length24.86781196
Min length16

Characters and Unicode

Total characters3426312
Distinct characters21
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowNo Documented Test Result
2nd rowNo Documented Test Result
3rd rowNo Documented Test Result
4th rowNo Documented Test Result
5th rowNo Documented Test Result

Common Values

ValueCountFrequency (%)
No Documented Test Result135095
88.6%
Long Term Infection1987
 
1.3%
Recent Infection699
 
0.5%
(Missing)14771
 
9.7%

Length

2021-06-15T08:50:07.723673image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-06-15T08:50:07.797154image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
ValueCountFrequency (%)
test135095
24.7%
documented135095
24.7%
result135095
24.7%
no135095
24.7%
infection2686
 
0.5%
term1987
 
0.4%
long1987
 
0.4%
recent699
 
0.1%

Most occurring characters

ValueCountFrequency (%)
e546451
15.9%
409958
12.0%
t408670
11.9%
o274863
 
8.0%
u270190
 
7.9%
s270190
 
7.9%
n143153
 
4.2%
c138480
 
4.0%
m137082
 
4.0%
T137082
 
4.0%
Other values (11)690193
20.1%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter2468615
72.0%
Uppercase Letter547739
 
16.0%
Space Separator409958
 
12.0%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e546451
22.1%
t408670
16.6%
o274863
11.1%
u270190
10.9%
s270190
10.9%
n143153
 
5.8%
c138480
 
5.6%
m137082
 
5.6%
d135095
 
5.5%
l135095
 
5.5%
Other values (4)9346
 
0.4%
Uppercase Letter
ValueCountFrequency (%)
T137082
25.0%
R135794
24.8%
N135095
24.7%
D135095
24.7%
I2686
 
0.5%
L1987
 
0.4%
Space Separator
ValueCountFrequency (%)
409958
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin3016354
88.0%
Common409958
 
12.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
e546451
18.1%
t408670
13.5%
o274863
9.1%
u270190
9.0%
s270190
9.0%
n143153
 
4.7%
c138480
 
4.6%
m137082
 
4.5%
T137082
 
4.5%
R135794
 
4.5%
Other values (10)554399
18.4%
Common
ValueCountFrequency (%)
409958
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII3426312
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e546451
15.9%
409958
12.0%
t408670
11.9%
o274863
 
8.0%
u270190
 
7.9%
s270190
 
7.9%
n143153
 
4.2%
c138480
 
4.0%
m137082
 
4.0%
T137082
 
4.0%
Other values (11)690193
20.1%

CURRENT_STATUS
Categorical

HIGH CORRELATION

Distinct19
Distinct (%)< 0.1%
Missing12
Missing (%)< 0.1%
Memory size1.2 MiB
ART Start
78136 
Lost to Follow Up
24534 
ART Restart
12355 
HIV+ non ART
11671 
Known Death
8414 
Other values (14)
17430 

Length

Max length34
Median length9
Mean length11.60592631
Min length9

Characters and Unicode

Total characters1770368
Distinct characters43
Distinct categories8 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique2 ?
Unique (%)< 0.1%

Sample

1st rowART Start
2nd rowART Start
3rd rowART Start
4th rowART Start
5th rowART Transfer Out

Common Values

ValueCountFrequency (%)
ART Start78136
51.2%
Lost to Follow Up24534
 
16.1%
ART Restart12355
 
8.1%
HIV+ non ART11671
 
7.7%
Known Death8414
 
5.5%
ART Transfer Out8215
 
5.4%
ART Transfer In3691
 
2.4%
HIV+ Non ART2018
 
1.3%
Stopped Treatment1998
 
1.3%
HIV exposed status unknown678
 
0.4%
Other values (9)830
 
0.5%

Length

2021-06-15T08:50:08.029509image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
art116137
30.4%
start78186
20.4%
lost24534
 
6.4%
up24534
 
6.4%
to24534
 
6.4%
follow24534
 
6.4%
hiv14645
 
3.8%
non13689
 
3.6%
transfer12406
 
3.2%
restart12355
 
3.2%
Other values (18)36812
 
9.6%

Most occurring characters

ValueCountFrequency (%)
t255276
14.4%
229826
13.0%
T131042
 
7.4%
R128993
 
7.3%
o124111
 
7.0%
r117902
 
6.7%
A116638
 
6.6%
a114514
 
6.5%
S80298
 
4.5%
n63542
 
3.6%
Other values (33)408226
23.1%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter919661
51.9%
Uppercase Letter606639
34.3%
Space Separator229826
 
13.0%
Math Symbol13689
 
0.8%
Dash Punctuation550
 
< 0.1%
Other Punctuation1
 
< 0.1%
Open Punctuation1
 
< 0.1%
Close Punctuation1
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
t255276
27.8%
o124111
13.5%
r117902
12.8%
a114514
12.5%
n63542
 
6.9%
s51995
 
5.7%
l49118
 
5.3%
e41575
 
4.5%
w33883
 
3.7%
p29468
 
3.2%
Other values (11)38277
 
4.2%
Uppercase Letter
ValueCountFrequency (%)
T131042
21.6%
R128993
21.3%
A116638
19.2%
S80298
13.2%
U24648
 
4.1%
L24534
 
4.0%
F24534
 
4.0%
I18572
 
3.1%
H14645
 
2.4%
V14645
 
2.4%
Other values (6)28090
 
4.6%
Space Separator
ValueCountFrequency (%)
229826
100.0%
Dash Punctuation
ValueCountFrequency (%)
-550
100.0%
Math Symbol
ValueCountFrequency (%)
+13689
100.0%
Other Punctuation
ValueCountFrequency (%)
,1
100.0%
Open Punctuation
ValueCountFrequency (%)
(1
100.0%
Close Punctuation
ValueCountFrequency (%)
)1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin1526300
86.2%
Common244068
 
13.8%

Most frequent character per script

Latin
ValueCountFrequency (%)
t255276
16.7%
T131042
 
8.6%
R128993
 
8.5%
o124111
 
8.1%
r117902
 
7.7%
A116638
 
7.6%
a114514
 
7.5%
S80298
 
5.3%
n63542
 
4.2%
s51995
 
3.4%
Other values (27)341989
22.4%
Common
ValueCountFrequency (%)
229826
94.2%
+13689
 
5.6%
-550
 
0.2%
,1
 
< 0.1%
(1
 
< 0.1%
)1
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII1770368
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
t255276
14.4%
229826
13.0%
T131042
 
7.4%
R128993
 
7.3%
o124111
 
7.0%
r117902
 
6.7%
A116638
 
6.6%
a114514
 
6.5%
S80298
 
4.5%
n63542
 
3.6%
Other values (33)408226
23.1%

DATE_CURRENT_STATUS
Unsupported

REJECTED
UNSUPPORTED

Missing135
Missing (%)0.1%
Memory size1.2 MiB

REGIMENTYPE
Categorical

HIGH CORRELATION
MISSING

Distinct5
Distinct (%)< 0.1%
Missing22804
Missing (%)14.9%
Memory size1.2 MiB
ART First Line Adult
124251 
ART First Line Children
 
4694
ART Second Line Adult
 
759
ART Second Line Children
 
41
Third Line
 
3

Length

Max length24
Median length20
Mean length20.11541604
Min length10

Characters and Unicode

Total characters2609935
Distinct characters20
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowART First Line Adult
2nd rowART First Line Adult
3rd rowART First Line Adult
4th rowART First Line Adult
5th rowART First Line Adult

Common Values

ValueCountFrequency (%)
ART First Line Adult124251
81.4%
ART First Line Children4694
 
3.1%
ART Second Line Adult759
 
0.5%
ART Second Line Children41
 
< 0.1%
Third Line3
 
< 0.1%
(Missing)22804
 
14.9%

Length

2021-06-15T08:50:08.269104image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-06-15T08:50:08.352312image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
ValueCountFrequency (%)
line129748
25.0%
art129745
25.0%
first128945
24.8%
adult125010
24.1%
children4735
 
0.9%
second800
 
0.2%
third3
 
< 0.1%

Most occurring characters

ValueCountFrequency (%)
389238
14.9%
i263431
10.1%
A254755
 
9.8%
t253955
 
9.7%
n135283
 
5.2%
e135283
 
5.2%
r133683
 
5.1%
d130548
 
5.0%
T129748
 
5.0%
L129748
 
5.0%
Other values (10)654263
25.1%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter1442221
55.3%
Uppercase Letter778476
29.8%
Space Separator389238
 
14.9%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
i263431
18.3%
t253955
17.6%
n135283
9.4%
e135283
9.4%
r133683
9.3%
d130548
9.1%
l129745
9.0%
s128945
8.9%
u125010
8.7%
h4738
 
0.3%
Other values (2)1600
 
0.1%
Uppercase Letter
ValueCountFrequency (%)
A254755
32.7%
T129748
16.7%
L129748
16.7%
R129745
16.7%
F128945
16.6%
C4735
 
0.6%
S800
 
0.1%
Space Separator
ValueCountFrequency (%)
389238
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin2220697
85.1%
Common389238
 
14.9%

Most frequent character per script

Latin
ValueCountFrequency (%)
i263431
11.9%
A254755
11.5%
t253955
11.4%
n135283
 
6.1%
e135283
 
6.1%
r133683
 
6.0%
d130548
 
5.9%
T129748
 
5.8%
L129748
 
5.8%
R129745
 
5.8%
Other values (9)524518
23.6%
Common
ValueCountFrequency (%)
389238
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII2609935
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
389238
14.9%
i263431
10.1%
A254755
 
9.8%
t253955
 
9.7%
n135283
 
5.2%
e135283
 
5.2%
r133683
 
5.1%
d130548
 
5.0%
T129748
 
5.0%
L129748
 
5.0%
Other values (10)654263
25.1%

REGIMEN
Categorical

HIGH CARDINALITY
HIGH CORRELATION
MISSING

Distinct63
Distinct (%)< 0.1%
Missing22799
Missing (%)14.9%
Memory size1.2 MiB
TDF(300mg)+3TC(300mg)+DTG(50mg)
91687 
TDF(300mg)+3TC(300mg)+EFV(600mg)
20388 
AZT(300mg)+3TC(150mg)+NVP(200mg)
 
6888
AZT(300mg)+3TC(150mg)+ABC(300mg)
 
1202
TDF(300mg)+3TC(300mg)+LPV/r(200/50mg)
 
1109
Other values (58)
 
8479

Length

Max length51
Median length31
Mean length31.30168859
Min length15

Characters and Unicode

Total characters4061488
Distinct characters42
Distinct categories8 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique7 ?
Unique (%)< 0.1%

Sample

1st rowTDF(300mg)+3TC(300mg)+DTG(50mg)
2nd rowTDF(300mg)+3TC(300mg)+DTG(50mg)
3rd rowTDF(300mg)+3TC(300mg)+DTG(50mg)
4th rowTDF(300mg)+3TC(300mg)+DTG(50mg)
5th rowTDF(300mg)+3TC(300mg)+DTG(50mg)

Common Values

ValueCountFrequency (%)
TDF(300mg)+3TC(300mg)+DTG(50mg)91687
60.1%
TDF(300mg)+3TC(300mg)+EFV(600mg)20388
 
13.4%
AZT(300mg)+3TC(150mg)+NVP(200mg)6888
 
4.5%
AZT(300mg)+3TC(150mg)+ABC(300mg)1202
 
0.8%
TDF(300mg)+3TC(300mg)+LPV/r(200/50mg)1109
 
0.7%
ABC(60mg)+3TC(30mg)+LPV/r(40/10mg)1019
 
0.7%
AZT(300mg)+3TC(150mg)+EFV(600mg)788
 
0.5%
TDF(300mg)+3TC(30mg)+DTG(50mg)717
 
0.5%
AZT(10mg/ml)+3TC(10mg/ml)+NVP(10mg/ml)674
 
0.4%
AZT/3TC/NVP(60/30/50mg)594
 
0.4%
Other values (53)4687
 
3.1%
(Missing)22799
 
14.9%

Length

2021-06-15T08:50:08.614413image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
tdf(300mg)+3tc(300mg)+dtg(50mg91687
70.6%
tdf(300mg)+3tc(300mg)+efv(600mg20388
 
15.7%
azt(300mg)+3tc(150mg)+nvp(200mg6888
 
5.3%
azt(300mg)+3tc(150mg)+abc(300mg1202
 
0.9%
tdf(300mg)+3tc(300mg)+lpv/r(200/50mg1109
 
0.9%
abc(60mg)+3tc(30mg)+lpv/r(40/10mg1019
 
0.8%
azt(300mg)+3tc(150mg)+efv(600mg788
 
0.6%
tdf(300mg)+3tc(30mg)+dtg(50mg717
 
0.6%
azt(10mg/ml)+3tc(10mg/ml)+nvp(10mg/ml674
 
0.5%
azt/3tc/nvp(60/30/50mg594
 
0.5%
Other values (55)4720
 
3.6%

Most occurring characters

ValueCountFrequency (%)
0667275
16.4%
m387875
9.6%
g385220
9.5%
(385123
9.5%
)385123
9.5%
3373548
9.2%
T350656
8.6%
+255403
 
6.3%
D208791
 
5.1%
F138129
 
3.4%
Other values (32)524345
12.9%

Most occurring categories

ValueCountFrequency (%)
Decimal Number1200572
29.6%
Uppercase Letter1038405
25.6%
Lowercase Letter779484
19.2%
Open Punctuation385123
 
9.5%
Close Punctuation385123
 
9.5%
Math Symbol255403
 
6.3%
Other Punctuation17345
 
0.4%
Space Separator33
 
< 0.1%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
T350656
33.8%
D208791
20.1%
F138129
 
13.3%
C133469
 
12.9%
G93233
 
9.0%
V35193
 
3.4%
E21802
 
2.1%
A15716
 
1.5%
P13029
 
1.3%
Z11660
 
1.1%
Other values (4)16727
 
1.6%
Lowercase Letter
ValueCountFrequency (%)
m387875
49.8%
g385220
49.4%
r3332
 
0.4%
l2655
 
0.3%
d107
 
< 0.1%
o97
 
< 0.1%
i34
 
< 0.1%
a33
 
< 0.1%
z33
 
< 0.1%
t32
 
< 0.1%
Other values (4)66
 
< 0.1%
Decimal Number
ValueCountFrequency (%)
0667275
55.6%
3373548
31.1%
5106495
 
8.9%
624263
 
2.0%
115281
 
1.3%
212487
 
1.0%
41159
 
0.1%
833
 
< 0.1%
931
 
< 0.1%
Open Punctuation
ValueCountFrequency (%)
(385123
100.0%
Close Punctuation
ValueCountFrequency (%)
)385123
100.0%
Math Symbol
ValueCountFrequency (%)
+255403
100.0%
Other Punctuation
ValueCountFrequency (%)
/17345
100.0%
Space Separator
ValueCountFrequency (%)
33
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common2243599
55.2%
Latin1817889
44.8%

Most frequent character per script

Latin
ValueCountFrequency (%)
m387875
21.3%
g385220
21.2%
T350656
19.3%
D208791
11.5%
F138129
 
7.6%
C133469
 
7.3%
G93233
 
5.1%
V35193
 
1.9%
E21802
 
1.2%
A15716
 
0.9%
Other values (18)47805
 
2.6%
Common
ValueCountFrequency (%)
0667275
29.7%
(385123
17.2%
)385123
17.2%
3373548
16.6%
+255403
 
11.4%
5106495
 
4.7%
624263
 
1.1%
/17345
 
0.8%
115281
 
0.7%
212487
 
0.6%
Other values (4)1256
 
0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII4061488
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0667275
16.4%
m387875
9.6%
g385220
9.5%
(385123
9.5%
)385123
9.5%
3373548
9.2%
T350656
8.6%
+255403
 
6.3%
D208791
 
5.1%
F138129
 
3.4%
Other values (32)524345
12.9%

LAST_CLINIC_STAGE
Categorical

MISSING

Distinct6
Distinct (%)< 0.1%
Missing23531
Missing (%)15.4%
Memory size1.2 MiB
Stage I
91040 
Stage II
24113 
Stage III
12523 
Stage IV
 
1340
Stage II?
 
4

Length

Max length9
Median length7
Mean length7.391478907
Min length7

Characters and Unicode

Total characters953656
Distinct characters9
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1 ?
Unique (%)< 0.1%

Sample

1st rowStage III
2nd rowStage I
3rd rowStage I
4th rowStage I
5th rowStage I

Common Values

ValueCountFrequency (%)
Stage I91040
59.7%
Stage II24113
 
15.8%
Stage III12523
 
8.2%
Stage IV1340
 
0.9%
Stage II?4
 
< 0.1%
Stage IIt1
 
< 0.1%
(Missing)23531
 
15.4%

Length

2021-06-15T08:50:08.851069image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-06-15T08:50:08.935732image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
ValueCountFrequency (%)
stage129021
50.0%
i91040
35.3%
ii24117
 
9.3%
iii12523
 
4.9%
iv1340
 
0.5%
iit1
 
< 0.1%

Most occurring characters

ValueCountFrequency (%)
I178185
18.7%
t129022
13.5%
S129021
13.5%
a129021
13.5%
g129021
13.5%
e129021
13.5%
129021
13.5%
V1340
 
0.1%
?4
 
< 0.1%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter516085
54.1%
Uppercase Letter308546
32.4%
Space Separator129021
 
13.5%
Other Punctuation4
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
t129022
25.0%
a129021
25.0%
g129021
25.0%
e129021
25.0%
Uppercase Letter
ValueCountFrequency (%)
I178185
57.7%
S129021
41.8%
V1340
 
0.4%
Space Separator
ValueCountFrequency (%)
129021
100.0%
Other Punctuation
ValueCountFrequency (%)
?4
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin824631
86.5%
Common129025
 
13.5%

Most frequent character per script

Latin
ValueCountFrequency (%)
I178185
21.6%
t129022
15.6%
S129021
15.6%
a129021
15.6%
g129021
15.6%
e129021
15.6%
V1340
 
0.2%
Common
ValueCountFrequency (%)
129021
> 99.9%
?4
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII953656
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
I178185
18.7%
t129022
13.5%
S129021
13.5%
a129021
13.5%
g129021
13.5%
e129021
13.5%
129021
13.5%
V1340
 
0.1%
?4
 
< 0.1%

LAST_VIRAL_LOAD
Real number (ℝ≥0)

SKEWED
ZEROS

Distinct6330
Distinct (%)4.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean8789.246007
Minimum0
Maximum10000000
Zeros101214
Zeros (%)66.3%
Negative0
Negative (%)0.0%
Memory size1.2 MiB
2021-06-15T08:50:09.062493image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q330
95-th percentile546.45
Maximum10000000
Range10000000
Interquartile range (IQR)30

Descriptive statistics

Standard deviation138564.026
Coefficient of variation (CV)15.7651778
Kurtosis2207.693892
Mean8789.246007
Median Absolute Deviation (MAD)0
Skewness39.67085197
Sum1340817057
Variance1.919998931 × 1010
MonotonicityNot monotonic
2021-06-15T08:50:09.207116image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0101214
66.3%
4017650
 
11.6%
208596
 
5.6%
193232
 
2.1%
400391
 
0.3%
150169
 
0.1%
41121
 
0.1%
46119
 
0.1%
50117
 
0.1%
47115
 
0.1%
Other values (6320)20828
 
13.7%
ValueCountFrequency (%)
0101214
66.3%
120
 
< 0.1%
23
 
< 0.1%
32
 
< 0.1%
515
 
< 0.1%
62
 
< 0.1%
72
 
< 0.1%
82
 
< 0.1%
93
 
< 0.1%
1013
 
< 0.1%
ValueCountFrequency (%)
100000007
< 0.1%
99209281
 
< 0.1%
91433041
 
< 0.1%
83921611
 
< 0.1%
83800001
 
< 0.1%
79900001
 
< 0.1%
72282921
 
< 0.1%
68351231
 
< 0.1%
62798701
 
< 0.1%
58208031
 
< 0.1%

LAST_CD4
Real number (ℝ)

SKEWED
ZEROS

Distinct1882
Distinct (%)1.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean140.8458447
Minimum-355
Maximum2225727
Zeros113980
Zeros (%)74.7%
Negative3
Negative (%)< 0.1%
Memory size1.2 MiB
2021-06-15T08:50:09.352531image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum-355
5-th percentile0
Q10
median0
Q35
95-th percentile668
Maximum2225727
Range2226082
Interquartile range (IQR)5

Descriptive statistics

Standard deviation6935.721873
Coefficient of variation (CV)49.24335457
Kurtosis82788.46613
Mean140.8458447
Median Absolute Deviation (MAD)0
Skewness280.0604144
Sum21486315.31
Variance48104237.9
MonotonicityNot monotonic
2021-06-15T08:50:09.489634image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0113980
74.7%
298
 
0.1%
1089
 
0.1%
387
 
0.1%
584
 
0.1%
32084
 
0.1%
483
 
0.1%
4382
 
0.1%
5080
 
0.1%
879
 
0.1%
Other values (1872)37806
 
24.8%
ValueCountFrequency (%)
-3551
 
< 0.1%
-2701
 
< 0.1%
-2611
 
< 0.1%
0113980
74.7%
0.21
 
< 0.1%
0.51
 
< 0.1%
0.61
 
< 0.1%
0.91
 
< 0.1%
173
 
< 0.1%
1.11
 
< 0.1%
ValueCountFrequency (%)
22257271
< 0.1%
14707421
< 0.1%
3083081
< 0.1%
2259461
< 0.1%
1880481
< 0.1%
994841
< 0.1%
813791
< 0.1%
781611
< 0.1%
479051
< 0.1%
452831
< 0.1%

LAST_CD4P
Real number (ℝ)

SKEWED
ZEROS

Distinct533
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2.021743012
Minimum-494
Maximum9968
Zeros151427
Zeros (%)99.3%
Negative1
Negative (%)< 0.1%
Memory size1.2 MiB
2021-06-15T08:50:09.633687image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum-494
5-th percentile0
Q10
median0
Q30
95-th percentile0
Maximum9968
Range10462
Interquartile range (IQR)0

Descriptive statistics

Standard deviation47.83585726
Coefficient of variation (CV)23.66070117
Kurtosis13449.17064
Mean2.021743012
Median Absolute Deviation (MAD)0
Skewness81.37770847
Sum308420.94
Variance2288.26924
MonotonicityNot monotonic
2021-06-15T08:50:09.773059image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0151427
99.3%
926
 
< 0.1%
225
 
< 0.1%
424
 
< 0.1%
623
 
< 0.1%
122
 
< 0.1%
822
 
< 0.1%
720
 
< 0.1%
519
 
< 0.1%
318
 
< 0.1%
Other values (523)926
 
0.6%
ValueCountFrequency (%)
-4941
 
< 0.1%
0151427
99.3%
0.221
 
< 0.1%
0.371
 
< 0.1%
122
 
< 0.1%
1.721
 
< 0.1%
225
 
< 0.1%
2.291
 
< 0.1%
318
 
< 0.1%
3.761
 
< 0.1%
ValueCountFrequency (%)
99681
< 0.1%
45181
< 0.1%
31021
< 0.1%
25701
< 0.1%
25611
< 0.1%
24651
< 0.1%
23011
< 0.1%
20711
< 0.1%
20651
< 0.1%
19461
< 0.1%

DATE_LAST_CD4
Unsupported

MISSING
REJECTED
UNSUPPORTED

Missing110846
Missing (%)72.7%
Memory size1.2 MiB
Distinct1444
Distinct (%)1.9%
Missing75477
Missing (%)49.5%
Memory size1.2 MiB
Minimum2009-07-01 00:00:00
Maximum2108-03-23 00:00:00
2021-06-15T08:50:09.908872image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-06-15T08:50:10.041653image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)

VIRAL_LOAD_DUE_DATE
Unsupported

MISSING
REJECTED
UNSUPPORTED

Missing42522
Missing (%)27.9%
Memory size1.2 MiB

VIRAL_LOAD_TYPE
Categorical

HIGH CORRELATION
MISSING

Distinct4
Distinct (%)< 0.1%
Missing42522
Missing (%)27.9%
Memory size1.2 MiB
Baseline
50993 
Routine
34193 
Second
18936 
Repeat
5908 

Length

Max length8
Median length7
Mean length7.237653367
Min length6

Characters and Unicode

Total characters796359
Distinct characters15
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowSecond
2nd rowRoutine
3rd rowSecond
4th rowSecond
5th rowSecond

Common Values

ValueCountFrequency (%)
Baseline50993
33.4%
Routine34193
22.4%
Second18936
 
12.4%
Repeat5908
 
3.9%
(Missing)42522
27.9%

Length

2021-06-15T08:50:10.279605image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-06-15T08:50:10.364109image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
ValueCountFrequency (%)
baseline50993
46.3%
routine34193
31.1%
second18936
 
17.2%
repeat5908
 
5.4%

Most occurring characters

ValueCountFrequency (%)
e166931
21.0%
n104122
13.1%
i85186
10.7%
a56901
 
7.1%
o53129
 
6.7%
B50993
 
6.4%
s50993
 
6.4%
l50993
 
6.4%
R40101
 
5.0%
t40101
 
5.0%
Other values (5)96909
12.2%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter686329
86.2%
Uppercase Letter110030
 
13.8%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e166931
24.3%
n104122
15.2%
i85186
12.4%
a56901
 
8.3%
o53129
 
7.7%
s50993
 
7.4%
l50993
 
7.4%
t40101
 
5.8%
u34193
 
5.0%
c18936
 
2.8%
Other values (2)24844
 
3.6%
Uppercase Letter
ValueCountFrequency (%)
B50993
46.3%
R40101
36.4%
S18936
 
17.2%

Most occurring scripts

ValueCountFrequency (%)
Latin796359
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
e166931
21.0%
n104122
13.1%
i85186
10.7%
a56901
 
7.1%
o53129
 
6.7%
B50993
 
6.4%
s50993
 
6.4%
l50993
 
6.4%
R40101
 
5.0%
t40101
 
5.0%
Other values (5)96909
12.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII796359
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e166931
21.0%
n104122
13.1%
i85186
10.7%
a56901
 
7.1%
o53129
 
6.7%
B50993
 
6.4%
s50993
 
6.4%
l50993
 
6.4%
R40101
 
5.0%
t40101
 
5.0%
Other values (5)96909
12.2%

DATE_LAST_REFILL
Unsupported

MISSING
REJECTED
UNSUPPORTED

Missing22797
Missing (%)14.9%
Memory size1.2 MiB

DATE_NEXT_REFILL
Unsupported

MISSING
REJECTED
UNSUPPORTED

Missing22797
Missing (%)14.9%
Memory size1.2 MiB

LAST_REFILL_DURATION
Real number (ℝ≥0)

ZEROS

Distinct49
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean100.3901817
Minimum0
Maximum1800
Zeros30765
Zeros (%)20.2%
Negative0
Negative (%)0.0%
Memory size1.2 MiB
2021-06-15T08:50:10.470341image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q130
median90
Q3180
95-th percentile180
Maximum1800
Range1800
Interquartile range (IQR)150

Descriptive statistics

Standard deviation74.52399719
Coefficient of variation (CV)0.7423434835
Kurtosis0.122510215
Mean100.3901817
Median Absolute Deviation (MAD)90
Skewness-0.02255972235
Sum15314723
Variance5553.826158
MonotonicityNot monotonic
2021-06-15T08:50:10.600777image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=49)
ValueCountFrequency (%)
18064917
42.6%
030765
20.2%
9024135
 
15.8%
3013983
 
9.2%
6013889
 
9.1%
152596
 
1.7%
14988
 
0.6%
120936
 
0.6%
150169
 
0.1%
4925
 
< 0.1%
Other values (39)149
 
0.1%
ValueCountFrequency (%)
030765
20.2%
31
 
< 0.1%
61
 
< 0.1%
724
 
< 0.1%
81
 
< 0.1%
103
 
< 0.1%
121
 
< 0.1%
131
 
< 0.1%
14988
 
0.6%
152596
 
1.7%
ValueCountFrequency (%)
18001
 
< 0.1%
4801
 
< 0.1%
3602
 
< 0.1%
3002
 
< 0.1%
2702
 
< 0.1%
2402
 
< 0.1%
2254
 
< 0.1%
18064917
42.6%
16818
 
< 0.1%
1601
 
< 0.1%

LAST_REFILL_SETTING
Categorical

CONSTANT
MISSING
REJECTED

Distinct1
Distinct (%)< 0.1%
Missing127526
Missing (%)83.6%
Memory size1.2 MiB
Facility
25026 

Length

Max length8
Median length8
Mean length8
Min length8

Characters and Unicode

Total characters200208
Distinct characters7
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowFacility
2nd rowFacility
3rd rowFacility
4th rowFacility
5th rowFacility

Common Values

ValueCountFrequency (%)
Facility25026
 
16.4%
(Missing)127526
83.6%

Length

2021-06-15T08:50:10.830943image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-06-15T08:50:10.897818image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
ValueCountFrequency (%)
facility25026
100.0%

Most occurring characters

ValueCountFrequency (%)
i50052
25.0%
F25026
12.5%
a25026
12.5%
c25026
12.5%
l25026
12.5%
t25026
12.5%
y25026
12.5%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter175182
87.5%
Uppercase Letter25026
 
12.5%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
i50052
28.6%
a25026
14.3%
c25026
14.3%
l25026
14.3%
t25026
14.3%
y25026
14.3%
Uppercase Letter
ValueCountFrequency (%)
F25026
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin200208
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
i50052
25.0%
F25026
12.5%
a25026
12.5%
c25026
12.5%
l25026
12.5%
t25026
12.5%
y25026
12.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII200208
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
i50052
25.0%
F25026
12.5%
a25026
12.5%
c25026
12.5%
l25026
12.5%
t25026
12.5%
y25026
12.5%

DATE_LAST_CLINIC
Unsupported

MISSING
REJECTED
UNSUPPORTED

Missing17811
Missing (%)11.7%
Memory size1.2 MiB

DATE_NEXT_CLINIC
Unsupported

MISSING
REJECTED
UNSUPPORTED

Missing21263
Missing (%)13.9%
Memory size1.2 MiB

DATE_TRACKED
Date

MISSING

Distinct1244
Distinct (%)50.5%
Missing150090
Missing (%)98.4%
Memory size1.2 MiB
Minimum2008-02-28 00:00:00
Maximum2020-01-27 00:00:00
2021-06-15T08:50:10.976234image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-06-15T08:50:11.115664image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)

OUTCOME
Categorical

HIGH CORRELATION
MISSING

Distinct13
Distinct (%)0.1%
Missing133094
Missing (%)87.2%
Memory size1.2 MiB
Did Not Attempt to Trace Patient
10123 
4944 
Lost to Follow Up
1525 
Died (Confirmed)
1082 
ART Transfer Out
 
901
Other values (8)
 
883

Length

Max length52
Median length32
Mean length21.24745606
Min length4

Characters and Unicode

Total characters413433
Distinct characters38
Distinct categories7 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowDid Not Attempt to Trace Patient
2nd rowDid Not Attempt to Trace Patient
3rd rowDid Not Attempt to Trace Patient
4th rowDid Not Attempt to Trace Patient
5th rowDid Not Attempt to Trace Patient

Common Values

ValueCountFrequency (%)
Did Not Attempt to Trace Patient10123
 
6.6%
4944
 
3.2%
Lost to Follow Up1525
 
1.0%
Died (Confirmed)1082
 
0.7%
ART Transfer Out901
 
0.6%
ART Restart550
 
0.4%
Stopped Treatment141
 
0.1%
Relocating91
 
0.1%
Pre-ART Transfer Out46
 
< 0.1%
Closer to new facility26
 
< 0.1%
Other values (3)29
 
< 0.1%
(Missing)133094
87.2%

Length

2021-06-15T08:50:11.374736image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
to11681
15.9%
patient10148
13.8%
trace10123
13.8%
attempt10123
13.8%
did10123
13.8%
not10123
13.8%
follow1525
 
2.1%
up1525
 
2.1%
lost1525
 
2.1%
art1451
 
2.0%
Other values (18)5206
7.1%

Most occurring characters

ValueCountFrequency (%)
t76610
18.5%
68927
16.7%
e34786
 
8.4%
o27780
 
6.7%
i22614
 
5.5%
a22069
 
5.3%
r13949
 
3.4%
T12733
 
3.1%
n12518
 
3.0%
d12489
 
3.0%
Other values (28)108958
26.4%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter267509
64.7%
Space Separator68927
 
16.7%
Uppercase Letter64849
 
15.7%
Control9888
 
2.4%
Open Punctuation1107
 
0.3%
Close Punctuation1107
 
0.3%
Dash Punctuation46
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
t76610
28.6%
e34786
13.0%
o27780
 
10.4%
i22614
 
8.5%
a22069
 
8.2%
r13949
 
5.2%
n12518
 
4.7%
d12489
 
4.7%
p11930
 
4.5%
m11382
 
4.3%
Other values (10)21382
 
8.0%
Uppercase Letter
ValueCountFrequency (%)
T12733
19.6%
A11620
17.9%
D11205
17.3%
P10212
15.7%
N10123
15.6%
R2138
 
3.3%
U1550
 
2.4%
L1525
 
2.4%
F1525
 
2.4%
C1126
 
1.7%
Other values (2)1092
 
1.7%
Control
ValueCountFrequency (%)
4944
50.0%
4944
50.0%
Space Separator
ValueCountFrequency (%)
68927
100.0%
Dash Punctuation
ValueCountFrequency (%)
-46
100.0%
Open Punctuation
ValueCountFrequency (%)
(1107
100.0%
Close Punctuation
ValueCountFrequency (%)
)1107
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin332358
80.4%
Common81075
 
19.6%

Most frequent character per script

Latin
ValueCountFrequency (%)
t76610
23.1%
e34786
10.5%
o27780
 
8.4%
i22614
 
6.8%
a22069
 
6.6%
r13949
 
4.2%
T12733
 
3.8%
n12518
 
3.8%
d12489
 
3.8%
p11930
 
3.6%
Other values (22)84880
25.5%
Common
ValueCountFrequency (%)
68927
85.0%
4944
 
6.1%
4944
 
6.1%
(1107
 
1.4%
)1107
 
1.4%
-46
 
0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII413433
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
t76610
18.5%
68927
16.7%
e34786
 
8.4%
o27780
 
6.7%
i22614
 
5.5%
a22069
 
5.3%
r13949
 
3.4%
T12733
 
3.1%
n12518
 
3.0%
d12489
 
3.0%
Other values (28)108958
26.4%

CAUSE_DEATH
Categorical

HIGH CARDINALITY
MISSING

Distinct103
Distinct (%)5.1%
Missing150543
Missing (%)98.7%
Memory size1.2 MiB
Unknown cause
754 
Other natural causes
293 
Other HIV disease resulting in other disease or conditions leading to death
242 
HIV disease resulting in other infectious and parasitic disease
138 
UNKNOWN
124 
Other values (98)
458 

Length

Max length91
Median length13
Mean length24.8765555
Min length3

Characters and Unicode

Total characters49977
Distinct characters68
Distinct categories9 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique61 ?
Unique (%)3.0%

Sample

1st rowSkin Disorder
2nd rowGIDDINESS
3rd rowB POSITIVE
4th rowHIV 2
5th rowALCOHOL COUNSELING

Common Values

ValueCountFrequency (%)
Unknown cause754
 
0.5%
Other natural causes293
 
0.2%
Other HIV disease resulting in other disease or conditions leading to death242
 
0.2%
HIV disease resulting in other infectious and parasitic disease138
 
0.1%
UNKNOWN124
 
0.1%
Illness97
 
0.1%
Felt sick/bad59
 
< 0.1%
Patient Dead39
 
< 0.1%
HIV INFECTED38
 
< 0.1%
HIV disease resulting in TB26
 
< 0.1%
Other values (93)199
 
0.1%
(Missing)150543
98.7%

Length

2021-06-15T08:50:11.671272image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
other917
11.9%
unknown883
11.5%
disease792
 
10.3%
cause755
 
9.8%
hiv458
 
6.0%
in409
 
5.3%
resulting407
 
5.3%
causes310
 
4.0%
natural293
 
3.8%
to256
 
3.3%
Other values (206)2197
28.6%

Most occurring characters

ValueCountFrequency (%)
5670
 
11.3%
e5005
 
10.0%
n4702
 
9.4%
s4197
 
8.4%
a3587
 
7.2%
i3126
 
6.3%
t2869
 
5.7%
o2361
 
4.7%
r2108
 
4.2%
u1957
 
3.9%
Other values (58)14395
28.8%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter38632
77.3%
Space Separator5670
 
11.3%
Uppercase Letter5533
 
11.1%
Other Punctuation83
 
0.2%
Dash Punctuation29
 
0.1%
Decimal Number19
 
< 0.1%
Open Punctuation5
 
< 0.1%
Close Punctuation5
 
< 0.1%
Math Symbol1
 
< 0.1%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
U929
16.8%
O731
13.2%
I725
13.1%
N531
9.6%
H515
9.3%
V501
9.1%
E225
 
4.1%
T167
 
3.0%
D134
 
2.4%
K133
 
2.4%
Other values (16)942
17.0%
Lowercase Letter
ValueCountFrequency (%)
e5005
13.0%
n4702
12.2%
s4197
10.9%
a3587
9.3%
i3126
8.1%
t2869
7.4%
o2361
 
6.1%
r2108
 
5.5%
u1957
 
5.1%
d1801
 
4.7%
Other values (15)6919
17.9%
Decimal Number
ValueCountFrequency (%)
24
21.1%
04
21.1%
53
15.8%
93
15.8%
42
10.5%
12
10.5%
31
 
5.3%
Other Punctuation
ValueCountFrequency (%)
/69
83.1%
,11
 
13.3%
;1
 
1.2%
.1
 
1.2%
&1
 
1.2%
Space Separator
ValueCountFrequency (%)
5670
100.0%
Dash Punctuation
ValueCountFrequency (%)
-29
100.0%
Open Punctuation
ValueCountFrequency (%)
(5
100.0%
Close Punctuation
ValueCountFrequency (%)
)5
100.0%
Math Symbol
ValueCountFrequency (%)
>1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin44165
88.4%
Common5812
 
11.6%

Most frequent character per script

Latin
ValueCountFrequency (%)
e5005
11.3%
n4702
 
10.6%
s4197
 
9.5%
a3587
 
8.1%
i3126
 
7.1%
t2869
 
6.5%
o2361
 
5.3%
r2108
 
4.8%
u1957
 
4.4%
d1801
 
4.1%
Other values (41)12452
28.2%
Common
ValueCountFrequency (%)
5670
97.6%
/69
 
1.2%
-29
 
0.5%
,11
 
0.2%
(5
 
0.1%
)5
 
0.1%
24
 
0.1%
04
 
0.1%
53
 
0.1%
93
 
0.1%
Other values (7)9
 
0.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII49977
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
5670
 
11.3%
e5005
 
10.0%
n4702
 
9.4%
s4197
 
8.4%
a3587
 
7.2%
i3126
 
6.3%
t2869
 
5.7%
o2361
 
4.7%
r2108
 
4.2%
u1957
 
3.9%
Other values (58)14395
28.8%

AGREED_DATE
Date

HIGH CORRELATION
MISSING

Distinct6
Distinct (%)33.3%
Missing152534
Missing (%)> 99.9%
Memory size1.2 MiB
Minimum2018-01-22 00:00:00
Maximum2018-12-27 00:00:00
2021-06-15T08:50:12.222041image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-06-15T08:50:12.308927image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=6)

BIOMETRIC
Categorical

HIGH CORRELATION
MISSING

Distinct2
Distinct (%)< 0.1%
Missing11229
Missing (%)7.4%
Memory size1.2 MiB
0.0
99074 
1.0
42249 

Length

Max length3
Median length3
Mean length3
Min length3

Characters and Unicode

Total characters423969
Distinct characters3
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1.0
2nd row1.0
3rd row0.0
4th row0.0
5th row0.0

Common Values

ValueCountFrequency (%)
0.099074
64.9%
1.042249
27.7%
(Missing)11229
 
7.4%

Length

2021-06-15T08:50:12.525062image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-06-15T08:50:12.596079image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
ValueCountFrequency (%)
0.099074
70.1%
1.042249
29.9%

Most occurring characters

ValueCountFrequency (%)
0240397
56.7%
.141323
33.3%
142249
 
10.0%

Most occurring categories

ValueCountFrequency (%)
Decimal Number282646
66.7%
Other Punctuation141323
33.3%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0240397
85.1%
142249
 
14.9%
Other Punctuation
ValueCountFrequency (%)
.141323
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common423969
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0240397
56.7%
.141323
33.3%
142249
 
10.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII423969
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0240397
56.7%
.141323
33.3%
142249
 
10.0%

PARTNERINFORMATION_ID
Unsupported

MISSING
REJECTED
UNSUPPORTED

Missing152552
Missing (%)100.0%
Memory size1.2 MiB

WARD
Categorical

HIGH CARDINALITY
MISSING

Distinct1103
Distinct (%)1.9%
Missing94466
Missing (%)61.9%
Memory size1.2 MiB
East Itam 2
 
2840
Ikot Ekpene Urban
 
2086
Central 1
 
1400
Bussa
 
1280
Mokwa Central
 
964
Other values (1098)
49516 

Length

Max length21
Median length7
Mean length8.38103846
Min length3

Characters and Unicode

Total characters486821
Distinct characters67
Distinct categories8 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique274 ?
Unique (%)0.5%

Sample

1st rowAuna Central
2nd rowAuna East
3rd rowAuna East
4th rowAuna Central
5th rowAuna East

Common Values

ValueCountFrequency (%)
East Itam 22840
 
1.9%
Ikot Ekpene Urban2086
 
1.4%
Central 11400
 
0.9%
Bussa1280
 
0.8%
Mokwa Central964
 
0.6%
East Itam 1838
 
0.5%
Numan I758
 
0.5%
Mbiabong Ikot678
 
0.4%
Itak610
 
0.4%
Afaha584
 
0.4%
Other values (1093)46048
30.2%
(Missing)94466
61.9%

Length

2021-06-15T08:50:12.841390image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
17606
 
7.7%
26506
 
6.6%
ikot5047
 
5.1%
east4789
 
4.8%
itam4307
 
4.4%
central3066
 
3.1%
urban2559
 
2.6%
ekpene2086
 
2.1%
32067
 
2.1%
abak1691
 
1.7%
Other values (1042)59093
59.8%

Most occurring characters

ValueCountFrequency (%)
a63927
 
13.1%
40733
 
8.4%
n26923
 
5.5%
o25212
 
5.2%
t24768
 
5.1%
k23137
 
4.8%
e21722
 
4.5%
i21439
 
4.4%
u16917
 
3.5%
I16827
 
3.5%
Other values (57)205216
42.2%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter342591
70.4%
Uppercase Letter81238
 
16.7%
Space Separator40733
 
8.4%
Decimal Number18362
 
3.8%
Other Punctuation2126
 
0.4%
Dash Punctuation1719
 
0.4%
Open Punctuation26
 
< 0.1%
Close Punctuation26
 
< 0.1%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
I16827
20.7%
E10346
12.7%
A8411
10.4%
M6084
 
7.5%
B5534
 
6.8%
U4800
 
5.9%
N4088
 
5.0%
W3931
 
4.8%
C3693
 
4.5%
O3299
 
4.1%
Other values (15)14225
17.5%
Lowercase Letter
ValueCountFrequency (%)
a63927
18.7%
n26923
 
7.9%
o25212
 
7.4%
t24768
 
7.2%
k23137
 
6.8%
e21722
 
6.3%
i21439
 
6.3%
u16917
 
4.9%
b15978
 
4.7%
s15440
 
4.5%
Other values (15)87128
25.4%
Decimal Number
ValueCountFrequency (%)
17935
43.2%
26762
36.8%
32138
 
11.6%
4696
 
3.8%
5290
 
1.6%
0153
 
0.8%
7124
 
0.7%
8123
 
0.7%
690
 
0.5%
951
 
0.3%
Other Punctuation
ValueCountFrequency (%)
/2064
97.1%
.61
 
2.9%
&1
 
< 0.1%
Space Separator
ValueCountFrequency (%)
40733
100.0%
Dash Punctuation
ValueCountFrequency (%)
-1719
100.0%
Open Punctuation
ValueCountFrequency (%)
(26
100.0%
Close Punctuation
ValueCountFrequency (%)
)26
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin423829
87.1%
Common62992
 
12.9%

Most frequent character per script

Latin
ValueCountFrequency (%)
a63927
 
15.1%
n26923
 
6.4%
o25212
 
5.9%
t24768
 
5.8%
k23137
 
5.5%
e21722
 
5.1%
i21439
 
5.1%
u16917
 
4.0%
I16827
 
4.0%
b15978
 
3.8%
Other values (40)166979
39.4%
Common
ValueCountFrequency (%)
40733
64.7%
17935
 
12.6%
26762
 
10.7%
32138
 
3.4%
/2064
 
3.3%
-1719
 
2.7%
4696
 
1.1%
5290
 
0.5%
0153
 
0.2%
7124
 
0.2%
Other values (7)378
 
0.6%

Most occurring blocks

ValueCountFrequency (%)
ASCII486821
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
a63927
 
13.1%
40733
 
8.4%
n26923
 
5.5%
o25212
 
5.2%
t24768
 
5.1%
k23137
 
4.8%
e21722
 
4.5%
i21439
 
4.4%
u16917
 
3.5%
I16827
 
3.5%
Other values (57)205216
42.2%

Interactions

2021-06-15T08:49:38.150202image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-06-15T08:49:38.316389image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-06-15T08:49:38.468678image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-06-15T08:49:38.616018image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-06-15T08:49:38.768973image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-06-15T08:49:38.921851image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-06-15T08:49:39.070578image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-06-15T08:49:39.222275image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-06-15T08:49:39.381499image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-06-15T08:49:39.527818image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-06-15T08:49:39.816001image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-06-15T08:49:39.953048image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-06-15T08:49:40.094605image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-06-15T08:49:40.237597image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-06-15T08:49:40.377017image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-06-15T08:49:40.516306image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-06-15T08:49:40.661861image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-06-15T08:49:40.807104image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-06-15T08:49:40.944930image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-06-15T08:49:41.079664image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-06-15T08:49:41.220212image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-06-15T08:49:41.360947image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-06-15T08:49:41.498074image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-06-15T08:49:41.636745image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-06-15T08:49:41.778708image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-06-15T08:49:41.928003image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-06-15T08:49:42.069723image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-06-15T08:49:42.206636image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-06-15T08:49:42.346624image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-06-15T08:49:42.488470image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-06-15T08:49:42.625530image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-06-15T08:49:42.767176image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-06-15T08:49:42.917840image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-06-15T08:49:43.068916image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-06-15T08:49:43.213688image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-06-15T08:49:43.351843image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-06-15T08:49:43.493814image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-06-15T08:49:43.636515image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-06-15T08:49:43.777192image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-06-15T08:49:44.091175image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-06-15T08:49:44.241745image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-06-15T08:49:44.389123image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-06-15T08:49:44.529036image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-06-15T08:49:44.662054image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-06-15T08:49:44.800253image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-06-15T08:49:44.940121image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-06-15T08:49:45.074513image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-06-15T08:49:45.213736image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-06-15T08:49:45.358506image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-06-15T08:49:45.506192image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-06-15T08:49:45.647174image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-06-15T08:49:45.781617image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-06-15T08:49:45.923381image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-06-15T08:49:46.063620image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-06-15T08:49:46.200631image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-06-15T08:49:46.340852image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-06-15T08:49:46.486177image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-06-15T08:49:46.640835image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-06-15T08:49:46.790266image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-06-15T08:49:46.934964image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-06-15T08:49:47.085066image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-06-15T08:49:47.236321image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-06-15T08:49:47.384326image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-06-15T08:49:47.531715image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Correlations

2021-06-15T08:50:12.970932image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2021-06-15T08:50:13.193143image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2021-06-15T08:50:13.414634image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2021-06-15T08:50:13.677689image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

2021-06-15T08:49:48.175291image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
A simple visualization of nullity by column.
2021-06-15T08:49:52.903721image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2021-06-15T08:49:57.417225image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.
2021-06-15T08:49:59.298584image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
The dendrogram allows you to more fully correlate variable completion, revealing trends deeper than the pairwise ones visible in the correlation heatmap.

Sample

First rows

StateL.G.AFacility NamePATIENT_IDFACILITY_IDHOSPITAL_NUMUNIQUE_IDGENDERDATE_BIRTHAGEage_unitmarital_statuseducationOCCUPATIONSTATElgaentry_pointDATE_CONFIRMED_HIVDATE_ENROLLED_PMTCTSOURCE_REFERRALTIME_HIV_DIAGNOSIStb_statusPREGNANTBREASTFEEDINGDATE_REGISTRATIONSTATUS_REGISTRATIONENROLLMENT_SETTINGCBO_IDDATE_STARTEDenrolled_ovcRECENCY_CONSENTRECENCY_TESTINGCURRENT_STATUSDATE_CURRENT_STATUSREGIMENTYPEREGIMENLAST_CLINIC_STAGELAST_VIRAL_LOADLAST_CD4LAST_CD4PDATE_LAST_CD4DATE_LAST_VIRAL_LOADVIRAL_LOAD_DUE_DATEVIRAL_LOAD_TYPEDATE_LAST_REFILLDATE_NEXT_REFILLLAST_REFILL_DURATIONLAST_REFILL_SETTINGDATE_LAST_CLINICDATE_NEXT_CLINICDATE_TRACKEDOUTCOMECAUSE_DEATHAGREED_DATEBIOMETRICPARTNERINFORMATION_IDWARD
0NigerMagamaRural Hosp- Auna821710011NGRA0002235/20NaNFemale1995-06-01 00:00:0025year(s)SinglePrimaryUnemployedNigerMagamaOPD2020-06-06 00:00:00NaTNaNNaNNo sign or symptoms of TB002020-06-06 00:00:00HIV+ Non ARTFacilityNaN2020-06-06 00:00:00NaNNaNNo Documented Test ResultART Start2020-06-06 00:00:00ART First Line AdultTDF(300mg)+3TC(300mg)+DTG(50mg)Stage III0.00.00.0NaNNaTNaNNaN2020-09-09 00:00:002020-12-06 00:00:0090NaN2020-09-06 00:00:002020-12-06 00:00:00NaTNaNNaNNaTNaNNaNNaN
1NigerMagamaRural Hosp- Auna821810011821/2/20NGRA6270010Male1990-01-01 00:00:0030year(s)WidowedQuranicEmployedNigerMagamaOPD2020-02-19 00:00:00NaTNaNNaNNo sign or symptoms of TB002020-02-19 00:00:00HIV+ Non ARTFacility0.02020-02-19 00:00:000.0NaNNo Documented Test ResultART Start2020-02-19 00:00:00ART First Line AdultTDF(300mg)+3TC(300mg)+DTG(50mg)Stage I766.00.00.0NaN2020-08-122021-02-12 00:00:00Second2021-04-17 00:00:002021-10-15 00:00:00180NaN2021-04-17 00:00:002021-10-15 00:00:00NaTNaNNaNNaT1.0NaNAuna Central
2NigerMagamaRural Hosp- Auna8219100111116/3/20NGRA2670011Female1995-01-01 00:00:0025year(s)DivorcedQuranicUnemployedNigerMagamaOPD2020-03-02 00:00:00NaTNaNNaNCurrently on INH prophylaxis002020-03-02 00:00:00HIV+ Non ARTFacility0.02020-03-02 00:00:000.0NaNNo Documented Test ResultART Start2020-03-02 00:00:00ART First Line AdultTDF(300mg)+3TC(300mg)+DTG(50mg)Stage I774.00.00.0NaN2021-03-022022-03-02 00:00:00Routine2021-04-01 00:00:002021-10-22 00:00:00180NaN2021-04-01 00:00:002021-10-22 00:00:00NaTNaNNaNNaT1.0NaNAuna East
3NigerMagamaRural Hosp- Auna822010011NGRA6270026NaNFemale1996-01-01 00:00:0024year(s)SinglePrimaryUnemployedNigerMagamaOPD2020-05-25 00:00:00NaTNaNNaNNo sign or symptoms of TB002020-05-25 00:00:00HIV+ Non ARTFacilityNaN2020-05-25 00:00:00NaNNaNNo Documented Test ResultART Start2020-05-25 00:00:00ART First Line AdultTDF(300mg)+3TC(300mg)+DTG(50mg)Stage I0.00.00.0NaNNaTNaNNaN2020-10-19 00:00:002021-04-19 00:00:00180NaN2020-10-19 00:00:002021-04-09 00:00:00NaTNaNNaNNaTNaNNaNNaN
4NigerMagamaRural Hosp- Auna8221100110037/01/20NGRA627002Female1975-01-01 00:00:0045year(s)MarriedQuranicUnemployedNigerMagamaOPD2020-01-08 00:00:00NaTNaNNaNCurrently on INH prophylaxis002020-01-08 00:00:00HIV+ Non ARTFacility0.02020-01-08 00:00:000.0NaNNo Documented Test ResultART Transfer Out2021-01-04 00:00:00ART First Line AdultTDF(300mg)+3TC(300mg)+DTG(50mg)Stage I19.00.00.0NaN2020-12-052021-06-05 00:00:00Second2020-11-09 00:00:002021-05-09 00:00:00180NaN2020-11-09 00:00:002021-05-09 00:00:00NaTNaNNaNNaT0.0NaNAuna East
5NigerMagamaRural Hosp- Auna8222100111957/20NGRA6270023Female1984-01-01 00:00:0036year(s)MarriedNoneUnemployedNigerMagamaOPD2020-04-29 00:00:00NaTNaNNaNNo sign or symptoms of TB002020-04-29 00:00:00HIV+ Non ARTFacilityNaN2020-04-29 00:00:00NaNNaNNo Documented Test ResultART Start2020-04-29 00:00:00ART First Line AdultTDF(300mg)+3TC(300mg)+DTG(50mg)Stage I0.00.00.0NaN2020-10-012021-04-01 00:00:00Second2020-11-12 00:00:002021-05-12 00:00:00180NaN2020-11-12 00:00:002021-05-12 00:00:00NaTNaNNaNNaTNaNNaNAuna Central
6NigerMagamaRural Hosp- Auna822310011532/20NGRA627009Female2000-01-01 00:00:0020year(s)MarriedQuranicUnemployedNigerMagamaOPD2020-01-29 00:00:00NaTNaNNaNCurrently on INH prophylaxis002020-01-29 00:00:00HIV+ Non ARTFacility0.02020-01-29 00:00:000.0NaNNo Documented Test ResultART Start2020-01-29 00:00:00ART First Line AdultTDF(300mg)+3TC(300mg)+DTG(50mg)Stage I0.00.00.0NaN2020-10-012021-04-01 00:00:00Second2020-11-04 00:00:002021-05-03 00:00:00180NaN2020-11-04 00:00:002021-05-03 00:00:00NaTDid Not Attempt to Trace PatientNaNNaT0.0NaNAuna East
7NigerMagamaRural Hosp- Auna8224100111137/2/20NGRA6270012Male1985-01-01 00:00:0035year(s)MarriedSenior SecondaryEmployedNigerMagamaOPD2020-03-02 00:00:00NaTNaNNaNNo sign or symptoms of TB002020-03-02 00:00:00HIV+ Non ARTFacility0.02020-03-02 00:00:000.0NaNNo Documented Test ResultART Transfer Out2020-05-19 00:00:00ART First Line AdultTDF(300mg)+3TC(300mg)+DTG(50mg)Stage I0.00.00.0NaNNaT2020-09-03 00:00:00Baseline2020-03-02 00:00:002020-04-01 00:00:0030NaN2020-03-02 00:00:002020-04-01 00:00:00NaTNaNNaNNaT0.0NaNSalka Central
8NigerMagamaRural Hosp- Auna8225100112154/20NGRA6270025Male2004-01-01 00:00:0016year(s)SingleSenior SecondaryUnemployedNigerMagamaIn-patient2020-05-08 00:00:00NaTNaNNaNNo sign or symptoms of TB002020-05-08 00:00:00HIV+ Non ARTFacility0.02020-05-08 00:00:000.0NaNNo Documented Test ResultART Start2020-05-08 00:00:00ART First Line AdultTDF(300mg)+3TC(300mg)+DTG(50mg)Stage I0.00.00.0NaN2020-10-012020-11-08 00:00:00Baseline2021-03-25 00:00:002021-10-29 00:00:00180NaN2021-03-25 00:00:002021-10-29 00:00:00NaTNaNNaNNaT1.0NaNKawo
9NigerMagamaRural Hosp- Auna822610011168/20NGRA627003Male1975-01-01 00:00:0045year(s)MarriedSenior SecondaryEmployedNigerMagamaOthers2019-01-01 00:00:00NaTNaNNaNNo sign or symptoms of TB002020-01-11 00:00:00HIV+ Non ARTFacility0.02019-01-01 00:00:000.0NaNNo Documented Test ResultART Transfer Out2020-01-12 00:00:00ART First Line AdultTDF(300mg)+3TC(300mg)+DTG(50mg)Stage I0.00.00.0NaNNaTNaNNaN2020-01-11 00:00:002020-02-10 00:00:0030NaN2019-01-01 00:00:002020-02-10 00:00:00NaTNaNNaNNaT0.0NaNAuna Central

Last rows

StateL.G.AFacility NamePATIENT_IDFACILITY_IDHOSPITAL_NUMUNIQUE_IDGENDERDATE_BIRTHAGEage_unitmarital_statuseducationOCCUPATIONSTATElgaentry_pointDATE_CONFIRMED_HIVDATE_ENROLLED_PMTCTSOURCE_REFERRALTIME_HIV_DIAGNOSIStb_statusPREGNANTBREASTFEEDINGDATE_REGISTRATIONSTATUS_REGISTRATIONENROLLMENT_SETTINGCBO_IDDATE_STARTEDenrolled_ovcRECENCY_CONSENTRECENCY_TESTINGCURRENT_STATUSDATE_CURRENT_STATUSREGIMENTYPEREGIMENLAST_CLINIC_STAGELAST_VIRAL_LOADLAST_CD4LAST_CD4PDATE_LAST_CD4DATE_LAST_VIRAL_LOADVIRAL_LOAD_DUE_DATEVIRAL_LOAD_TYPEDATE_LAST_REFILLDATE_NEXT_REFILLLAST_REFILL_DURATIONLAST_REFILL_SETTINGDATE_LAST_CLINICDATE_NEXT_CLINICDATE_TRACKEDOUTCOMECAUSE_DEATHAGREED_DATEBIOMETRICPARTNERINFORMATION_IDWARD
152542Akwa IbomItuWest Itam Public Health Center1608453071AKS/1321AKS/1321Female1983-06-10 00:00:0038year(s)SingleSenior SecondaryEmployedAkwa IbomIkonoHCT2021-05-28 00:00:00NaTNaNNaNTB suspected and referred for evaluation002021-05-28 00:00:00HIV+ Non ARTFacility0.02021-05-28 00:00:000.0NaNNaNART Start2021-05-28 00:00:00ART First Line AdultTDF(300mg)+3TC(300mg)+DTG(50mg)Stage I0.00.00.0NaNNaTNaNNaN2021-05-28 00:00:002021-06-25 00:00:0030NaN2021-05-28 00:00:002021-06-25 00:00:00NaTNaNNaNNaT1.0NaNIbiaku
152543Akwa IbomItuWest Itam Public Health Center1608463071TI/0177TI/0177Female1984-08-08 00:00:0037year(s)MarriedJunior SecondaryUnemployedAkwa IbomItuTransfer-in2016-01-11 00:00:002021-05-28NaNNaNCurrently on INH prophylaxis102021-05-28 00:00:00ART Transfer InFacility0.02016-01-11 00:00:000.0NaNNaNART Transfer In2021-05-28 00:00:00ART First Line AdultTDF(300mg)+3TC(300mg)+DTG(50mg)Stage I0.00.00.0NaNNaTNaNNaN2021-05-28 00:00:002021-08-24 00:00:0090NaN2021-05-28 00:00:002021-08-24 00:00:00NaTNaNNaNNaT1.0NaNEast Itam 4
152544Akwa IbomIkonoEdiene I Health Centre1608475840027/2127Male1975-05-28 00:00:0046year(s)MarriedSenior SecondaryEmployedAkwa IbomIkonoOPD2021-05-28 00:00:00NaTNaNNaNCurrently on INH prophylaxis002021-05-28 00:00:00HIV+ Non ARTFacility0.02021-05-28 00:00:000.0NaNNaNART Start2021-05-28 00:00:00ART First Line AdultTDF(300mg)+3TC(300mg)+DTG(50mg)Stage I0.00.00.0NaNNaTNaNNaN2021-05-28 00:00:002021-06-25 00:00:0030NaN2021-05-28 00:00:002021-06-28 00:00:00NaTNaNNaNNaT0.0NaNEdiene 1
152545Akwa IbomIkaUrua Inyang Primary Health Centre160848582AKS/011/CAM1/1267/PMVAKS/011/CAM1/1267/pmvFemale1971-05-28 00:00:0050year(s)SingleSenior SecondaryUnemployedAkwa IbomIkaOutreach2021-05-28 00:00:00NaTNaNNaNNo sign or symptoms of TB002021-05-28 00:00:00HIV Exposed Status UnknownClinical Platforms (Chemists/PMVs/Dispensary)0.0NaN0.0NaNNaNHIV Exposed Status Unknown2021-05-28 00:00:00ART First Line AdultTDF(300mg)+3TC(300mg)+DTG(50mg)Stage I0.00.00.0NaNNaTNaNNaN2021-05-28 00:00:002021-08-26 00:00:0090NaN2021-05-28 00:00:002021-08-26 00:00:00NaTNaNNaNNaT0.0NaNAchan 2
152546NigerMasheguMakera Model Primary Health Centre16084910026NGS/003/MAK/00125NGS/003/MAK/00125Female1970-06-15 00:00:0050year(s)MarriedNoneUnemployedNigerMasheguOutreach2020-06-18 00:00:00NaTNaNNaNNo sign or symptoms of TB002020-06-18 00:00:00HIV+ Non ARTCommunity0.02020-06-18 00:00:000.0NaNRecent InfectionART Restart2021-05-26 00:00:00ART First Line AdultTDF(300mg)+3TC(300mg)+DTG(50mg)Stage I0.00.00.0NaNNaTNaNNaN2021-05-26 00:00:002021-08-26 00:00:0090NaN2021-05-26 00:00:002021-08-26 00:00:00NaTNaNNaNNaT1.0NaNManigi
152547Akwa IbomIkaNto Etuk Udo Health Centre1608505790129/21129/21Female1994-05-28 00:00:0027year(s)SingleSenior SecondaryUnemployedAkwa IbomIkaOPD2021-05-28 00:00:00NaTNaNNaNNo sign or symptoms of TB002021-05-28 00:00:00HIV+ Non ARTFacility0.02021-05-28 00:00:000.0NaNNaNART Start2021-05-28 00:00:00ART First Line AdultTDF(300mg)+3TC(300mg)+DTG(50mg)Stage I0.00.00.0NaNNaTNaNNaN2021-05-28 00:00:002021-08-26 00:00:0090NaN2021-05-28 00:00:002021-08-26 00:00:00NaTNaNNaNNaT0.0NaNAchan 3
152548Akwa IbomIkonoIkono General Hospital160851587AKS/12/TEC/1506AKS/12/TEC/1506Male1976-01-01 00:00:0045year(s)MarriedSenior SecondaryEmployedAkwa IbomIkonoOutreach2021-05-28 00:00:00NaTNaNNaNNo sign or symptoms of TB002021-05-28 00:00:00HIV+ Non ARTCommunity0.02021-05-28 00:00:000.0NaNNo Documented Test ResultART Start2021-05-28 00:00:00ART First Line AdultTDF(300mg)+3TC(300mg)+DTG(50mg)Stage I0.00.00.0NaNNaTNaNNaN2021-05-28 00:00:002021-08-25 00:00:0090NaN2021-05-28 00:00:002021-08-25 00:00:00NaTNaNNaNNaT0.0NaNEdiene 1
152549Akwa IbomIkonoIkono General Hospital160852587AKS/12/TEC/1507AKS/12/TEC/1507Female1992-01-01 00:00:0029year(s)MarriedSenior SecondaryEmployedAkwa IbomIkonoOutreach2021-05-28 00:00:00NaTNaNNaNNo sign or symptoms of TB002021-05-28 00:00:00HIV+ Non ARTCommunity0.02021-05-28 00:00:000.0NaNNo Documented Test ResultART Start2021-05-28 00:00:00ART First Line AdultTDF(300mg)+3TC(300mg)+DTG(50mg)Stage I0.00.00.0NaNNaTNaNNaN2021-05-28 00:00:002021-08-25 00:00:0090NaN2021-05-28 00:00:002021-08-25 00:00:00NaTNaNNaNNaT0.0NaNEdiene 1
152550Akwa IbomIkonoIkono General Hospital160853587AKS/12/TEC/1505AKS/12/TEC/1505Female1975-01-01 00:00:0046year(s)MarriedSenior SecondaryEmployedAkwa IbomIkonoOutreach2021-05-28 00:00:00NaTNaNNaNNo sign or symptoms of TB002021-05-28 00:00:00HIV+ Non ARTCommunity0.02021-05-28 00:00:000.0NaNNo Documented Test ResultART Start2021-05-28 00:00:00ART First Line AdultTDF(300mg)+3TC(300mg)+DTG(50mg)Stage I0.00.00.0NaNNaTNaNNaN2021-05-28 00:00:002021-08-25 00:00:0090NaN2021-05-28 00:00:002021-08-25 00:00:00NaTNaNNaNNaT0.0NaNEdiene 1
152551NigerBorguWawa BHC16085410021BHW00000227BHW00000227Male1987-05-28 00:00:0034year(s)MarriedPrimaryUnemployedNigerBorguOutreach2021-05-30 00:00:00NaTNaNNaNNo sign or symptoms of TB002021-05-30 00:00:00HIV+ Non ARTCommunity0.02021-05-30 00:00:000.0NaNNaNART Start2021-05-30 00:00:00ART First Line AdultTDF(300mg)+3TC(300mg)+DTG(50mg)Stage I0.00.00.0NaNNaTNaNNaN2021-05-30 00:00:002021-10-30 00:00:00180NaN2021-05-30 00:00:002021-10-30 00:00:00NaTNaNNaNNaT1.0NaNWawa